Penalized estimation of flexible hidden Markov models for time series of counts
Tóm tắt
We propose an effectively nonparametric approach to fitting hidden Markov models to time series of counts, where the state-dependent distributions are estimated in a completely data-driven way without the need to specify a parametric family of distributions. To avoid overfitting, a roughness penalty based on higher-order differences between adjacent count probabilities is added to the likelihood, which is demonstrated to produce smooth state-dependent probability mass functions. The feasibility of the suggested approach is assessed in simulation experiments, and further illustrated in two real-data applications, where we model the distributions of (i) major earthquake counts and (ii) acceleration counts of an oceanic whitetip shark (Carcharhinus longimanus) over time. The proposed methodology is implemented in the accompanying R package countHMM, which is available on CRAN.
Tài liệu tham khảo
Adam, T.: countHMM: Penalized estimation of flexible hidden Markov models for time series of counts. R package, version 0.1.0. (2019). https://cran.r-project.org/package=countHMM
Alexandrovich, G., Holzmann, H., Leister, A.: Nonparametric identification and maximum likelihood estimation for hidden Markov models. Biometrika 103, 423–434 (2016)
Altman, R.M., Petkau, A.J.: Application of hidden Markov models to multiple sclerosis lesion count data. Stat. Med. 24(5), 2335–2344 (2005)
Anderson, G., Farcomeni, A., Pittau, M.G., Zelli, R.: Rectangular latent Markov models for time-specific clustering, with an analysis of the well being of nations. J. R. Stat. Soc. (Ser. C) 68(3), 603–621 (2019)
Baum, J., Medina, E., Musick, J.A., Smale, M.: Carcharhinus longimanus. The IUCN Red List of threatened species 2015, 2019 (2015). https://doi.org/10.2305/IUCN.UK.2015.RLTS.T39374A85699641.en. Downloaded on May 23
Bebbington, M.S.: Identifying volcanic regimes using hidden Markov models. Geophys. J. Int. 171(2), 921–942 (2007)
Bulla, J., Lagona, F., Maruotti, A., Picone, M.: A multivariate hidden Markov model for the identification of sea regimes from incomplete skewed and circular time series. J. Agric. Biol. Environ. Stat. 17(4), 544–567 (2012)
Drost, F.C., van den Akker, R., Werker, B.J.M.: Efficient estimation of auto-regression parameters and innovation distributions for semiparametric integer-valued AR(p) models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 71(2), 467–485 (2009)
Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 89–121 (1996)
Farcomeni, A.: Penalized estimation in latent Markov models, with application to monitoring serum calcium levels in end-stage kidney insufficiency. Biom. J. 59, 1035–1046 (2017)
Hambuckers, J., Kneib, T., Langrock, R., Silbersdorff, A.: A Markov-switching generalized additive model for compound Poisson processes, with applications to operational loss models. Quant. Financ. 18(10), 1–20 (2018)
Howey-Jordan, L.A., Brooks, E.J., Abercrombie, D.L., Jordan, L.K.B., Brooks, A., Williams, S., Gospodarczyk, E., Chapman, D.D.: Complex movements, philopatry and expanded depth range of a severely threatened pelagic shark, the oceanic whitetip (Carcharhinus longimanus) in the western North Atlantic. PLoS One 8(2), e56588 (2013)
Jackson, C.H., Sharples, L.D.: Hidden Markov models for the onset and progression of bronchiolitis obliterans syndrome in lung transplant recipients. Stat. Med. 21(1), 113–128 (2002)
Lagona, F., Maruotti, A., Padovano, F.: Multilevel multivariate modelling of legislative count data, with a hidden Markov chain. J. R. Stat. Soc. Ser. A (Stat. Soc.) 178(3), 705–723 (2015)
Langrock, R.: Flexible latent-state modelling of Old Faithful’s eruption inter-arrival times in 2009. Aust. N. Z. J. Stat. 54(3), 261–279 (2012)
Langrock, R., Zucchini, W.: Hidden Markov models with arbitrary state dwell-time distributions. Comput. Stat. Data Anal. 55(1), 715–724 (2012)
Langrock, R., Swihart, B.J., Caffo, B.S., Crainiceanu, C.M., Punjabi, N.M.: Combining hidden Markov models for comparing the dynamics of multiple sleep electroencephalograms. Stat. Med. 32(19), 3342–3356 (2013)
Langrock, R., Kneib, T., Sohn, A., DeRuiter, S.L.: Nonparametric inference in hidden Markov models using P-splines. Biometrics 71, 520–528 (2015)
Langrock, R., Adam, T., Leos-Barajas, V., Mews, S., Miller, D.L., Papastamatiou, Y.P.: Spline-based nonparametric inference in general state-switching models. Stat. Neerl. 72(3), 179–200 (2018)
Le Strat, Y., Carrat, F.: Monitoring epidemiologic surveillance data using hidden Markov models. Stat. Med. 18(24), 3463–3478 (1999)
Lear, K.O., Whitney, N.M., Brewster, L.R., Morris, J.M., Hueter, R.E., Gleiss, A.C.: Correlations of metabolic rate and body acceleration in three species of coastal sharks under contrasting temperature regimes. J. Exp. Biol. 220, 397–407 (2017)
Leos-Barajas, V., Photopoulou, T., Langrock, R., Patterson, T.A., Watanabe, Y.Y., Murgatroyd, M., Papastamatiou, Y.P.: Analysis of animal accelerometer data using hidden Markov models. Methods Ecol. Evol. 8(2), 161–173 (2017)
Li, L., Cheng, J.: Modeling and forecasting corporate default counts using hidden Markov model. J. Econ. Bus. Manag. 3(5), 493–497 (2015)
MacDonald, I.L., Zucchini, W.: Hidden Markov models and other models for discrete-valued time series. Chapman and Hall/CRC, Boca Raton (1997)
Marino, M.F., Tzavidis, N., Alfò, M.: Mixed hidden Markov quantile regression models for longitudinal data with possibly incomplete sequences. Stat. Methods Med. Res. 27(7), 2231–2246 (2018)
Maruotti, A., Rocci, R.: A mixed non-homogeneous hidden Markov model for categorical data, with application to alcohol consumption. Stat. Med. 31(9), 871–886 (2012)
Pohle, J., Langrock, R., van Beest, F.M., Schmidt, N.M.: Selecting the number of states in hidden Markov models—pragmatic solutions illustrated using animal movement. J. Agric. Biol. Environ. Stat. 22(3), 270–293 (2017)
Popov, V., Langrock, R., DeRuiter, S.L., Visser, F.: An analysis of pilot whale vocalization activity using hidden Markov models. J. Acoust. Soc. Am. 141(1), 159–171 (2017)
R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2017). https://www.r-project.org
Schliehe-Diecks, S., Kappeler, P.M., Langrock, R.: On the application of mixed hidden Markov models to multiple behavioural time series. Interface Focus 2(2), 180–189 (2012)
Scott, D.W., Tapia, R.A., Thompson, J.R.: Nonparametric probability density estimation by discrete maximum penalized-likelihood criteria. Ann. Stat. 8(4), 820–832 (1980)
Simonoff, J.S.: A penalty function approach to smoothing large sparse contingency tables. Ann. Stat. 11(1), 208–218 (1983)
Städler, N., Mukherjee, S.: Penalized estimation in high-dimensional hidden Markov models with state-specific graphical models. Ann. Appl. Stat. 7(4), 2157–2179 (2013)
Turner, R.: hmm.discnp: Hidden Markov models with discrete non-parametric observation distributions. R package, version 2.1-5 (2018). https://cran.r-project.org/package=hmm.discnp
Visser, I., Raijmakers, M.E.J., Molenaar, P.: Fitting hidden Markov models to psychological data. Sci. Program. 10(3), 185–199 (2002)
Weiß, C.H.: An Introduction to Discrete-Valued Time Series. Wiley, Chichester (2018)
Zucchini, W., MacDonald, I.L., Langrock, R.: Hidden Markov Models for Time Series: An Introduction Using R, 2nd edn. Chapman and Hall/CRC, Boca Raton (2016)