Unsupervised learning on U.S. weather forecast performance

Computational Statistics - Tập 38 - Trang 1193-1213 - 2023
Chuyuan Lin1, Ying Yu1, Lucas Y. Wu1, Jiguo Cao1
1Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada

Tóm tắt

Nowadays, climate events and weather predictions have a huge impact on human activities. To understand the accuracy of weather prediction, we applied the functional principal component analysis (FPCA) method to investigate the main pattern of variance within the U.S. weather prediction error over a period of 3 years. We further grouped the states in the U.S. based on their similarity in weather forecast performance using two types of functional clustering approaches: the filtering method and the model-based method. The strengths and weaknesses of each clustering method were detected through the simulation studies. Then, the clustering approaches were applied to U.S. weather data from 2014 to 2017. Through clustering, cluster-specific patterns were visually detected, and the cluster-to-cluster differences were quantified in order to identify the most and least predictable U.S. states.

Tài liệu tham khảo

Abraham C, Cornillon PA, Matzner-Løber E, Molinari N (2003) Unsupervised curve clustering using b-splines. Scandinavian J stat 30(3):581–595 Adams RA, Fournier JJ (2003) Sobolev spaces, vol 140. Elsevier, Atlanta Adams RM, Rosenzweig C, Peart RM, Ritchie JT, McCarl BA, Glyer JD, Curry RB, Jones JW, Boote KJ, Allen LH Jr (1990) Global climate change and us agriculture. Nature 345(6272):219–224 Adelfio G, Chiodi M, D’Alessandro A, Luzio D (2011) FPCA algorithm for waveform clustering. J Commun Comput 8(6):494–502 Bauer P, Thorpe A, Brunet G (2015) The quiet revolution of numerical weather prediction. Nature 525(7567):47–55 Besse PC, Cardot H, Stephenson DB (2000) Autoregressive forecasting of some functional climatic variations. Scandinavian J Stat 27(4):673–687 Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transact Pattern Anal Mach Intell 22(7):719–725 Bosq D (1996) Nonparametric statistics for stochastic processes: estimation and prediction, vol 110. Springer-Verlag, New York Bouveyron C (2015) funFEM: Clustering in the Discriminative Functional Subspace. https://CRAN.R-project.org/package=funFEM, r package version 1.1 Bouveyron C, Côme E, Jacques J (2015) The discriminative functional mixture model for a comparative analysis of bike sharing systems. The Annals Appl Stat 9(4):1726–1760 Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control, 5th edn. John Wiley & Sons, Hoboken, New Jersey Charrad M, Ghazzali N, Boiteau V, Niknafs A (2012) NbClust package: finding the relevant number of clusters in a dataset. UseR! 2012 Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) NbClust: an R package for determining the relevant number of clusters in a data set. J Stat Soft 61(6):1–36 Collomb G (1983) From non parametric regression to non parametric prediction: Survey of the mean square error and original results on the predictogram. In: Specifying statistical models, Springer, pp 182–204 Curry HB, Schoenberg IJ (1966) On Pólya frequency functions IV: the fundamental spline functions and their limits. J d’analyse mathématique 17(1):71–107 Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):1–21 Györfi L, Härdle W, Sarda P, Vieu P (1989) Nonparametric curve estimation from time series, vol 60. Springer-Verlag, New York Hartigan JA, Wong MA (1979) Algorithm as 136: A \(k\)-means clustering algorithm. J Royal Stat Soc Series C (Appl Stat) 28(1):100–108 Hornik K (2019) clue: Cluster ensembles. https://CRAN.R-project.org/package=clue, r package version 0.3-57 Jacques J, Preda C (2014) Functional data clustering: a survey. Adv Data Anal Classificat 8(3):231–255. https://doi.org/10.1007/s11634-013-0158-y James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Associat 98(462):397–408 Ke Y, Li J, Zhang W et al (2016) Structure identification in panel data analysis. The Annals Stat 44(3):1193–1233 Lazo JK, Morss RE, Demuth JL (2009) 300 billion served: Sources, perceptions, uses, and values of weather forecasts. Bullet Am Meteorol Soc 90(6):785–798 Li J, Yue M, Zhang W (2019) Subgroup identification via homogeneity pursuit for dense longitudinal/spatial data. Stat Med Orrell D, Smith L, Barkmeijer J, Palmer T (2001) Model error in weather forecasting. Nonlinear Process Geophys 8(6):357–371 Papadimitrou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Prentice-Hall, New York Radhika Y, Shashi M (2009) Atmospheric temperature prediction using support vector machines. Int J Comput Theory Eng 1(1):55–59 Ramsay J, Silverman B (2005) Functional data anal, 2nd edn. Springer, New York Ramsay J, Hooker G, Graves S (2009) Functional data analysis with R and MATLAB. Springer, New York Ramsay JO, Wickham H, Graves S, Hooker G (2018) fda: Functional Data Analysis. https://CRAN.R-project.org/package=fda, r package version 2.4.8 Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J Royal Stat Soc: Series B (Methodol) 53(1):233–243 Schmutz A, Jacques J, Bouveyron C, Cheze L, Martin P (2018) Clustering multivariate functional data in group-specific functional subspaces, https://hal.inria.fr/hal-01652467, preprint Schwarz G (1978) Estimating the dimension of a model. The Annals Stat 6(2):461–464 Silverman BW (1996) Smoothed functional principal components analysis by choice of norm. The Annals Stat 24(1):1–24