Computational Statistics
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
Sắp xếp:
Open-source machine learning: R meets Weka
Computational Statistics - Tập 24 - Trang 225-232 - 2008
Two of the prime open-source environments available for machine/statistical learning in data mining and knowledge discovery are the software packages Weka and R which have emerged from the machine learning and statistics communities, respectively. To make the different sets of tools from both environments available in a single unified system, an R package RWeka is suggested which interfaces Weka’s functionality to R. With only a thin layer of (mostly R) code, a set of general interface generators is provided which can set up interface functions with the usual “R look and feel”, re-using Weka’s standardized interface of learner classes (including classifiers, clusterers, associators, filters, loaders, savers, and stemmers) with associated methods.
An exploration of National Weather Service daily forecasts using R Shiny
Computational Statistics - Tập 38 - Trang 1173-1191 - 2023
Weather forecasts often affect daily lives of billions of people globally. Accurate forecasts can help combat and effectively mitigate damage caused by extreme weather. Alternatively, faulty forecasts can consequently lead to unnecessary financial investments and a waste of resources. Our work explores what is the extent of variability in errors of the National Weather Service predictions as observed in 113 cities in the United States between July 1, 2014 and September 1, 2017 and attempts to model the distribution of errors. Simultaneously, we deliver an interactive tool for future researchers to explore the actual and forecast weather data as well as expose hidden patterns in the data.
Skew exponential power stochastic volatility model for analysis of skewness, non-normal tails, quantiles and expectiles
Computational Statistics - Tập 31 - Trang 49-88 - 2015
This paper proposes a unified framework to analyse the skewness, tail heaviness, quantiles and expectiles of the return distribution based on a stochastic volatility model using a new parametrisation of the skew exponential power (SEP) distribution. The SEP distribution can express a wide range of distribution shapes through two shape parameters and one skewness parameter. Since the asymmetric Laplace and skew normal distributions are included as special cases, the proposed model is related to quantile regression and expectile regression. The efficient and simple Markov chain Monte Carlo estimation methods are also described. The proposed model is demonstrated using the simulated data and real data on daily return of foreign exchange rate.
Classification trees with soft splits optimized for ranking
Computational Statistics - Tập 34 - Trang 763-786 - 2019
We consider softening of splits in classification trees generated from multivariate numerical data. This methodology improves the quality of the ranking of the test cases measured by the AUC. Several ways to determine softening parameters are introduced and compared including softening algorithm present in the standard methods C4.5 and C5.0. In the first part of the paper, a few settings of softening determined only from ranges of training data in the tree branches are explored. The trees softened with these settings are used to study the effect of using the Laplace correction together with soft splits. In a later part we introduce methods which employ maximization of the classifier’s performance on the training set over the domain of the softening parameters. The non-linear optimization algorithm Nelder–Mead is used and various target functions are considered. The target function evaluating the AUC on the training set is compared with functions summing over training cases some transformation of the error of score. Several data sets from the UCI repository are used in experiments.
Model selection criteria based on cross-validatory concordance statistics
Computational Statistics - Tập 33 - Trang 595-621 - 2017
In the logistic regression framework, we present the development and investigation of three model selection criteria based on cross-validatory analogues of the traditional and adjusted c-statistics. These criteria are designed to estimate three corresponding measures of predictive error: the model misspecification prediction error, the fitting sample prediction error, and the sum of prediction errors. We aim to show that these estimators serve as suitable model selection criteria, facilitating the identification of a model that appropriately balances goodness-of-fit and parsimony, while achieving generalizability. We examine the properties of the selection criteria via an extensive simulation study designed as a factorial experiment. We then employ these measures in a practical application based on modeling the occurrence of heart disease.
Investigating the competitive assumption of Multinomial Logit models of brand choice by nonparametric modeling
Computational Statistics - Tập 19 - Trang 635-657 - 2004
The Multinomial Logit (MNL) model is still the only viable option to study nonlinear responsiveness of utility to covariates nonparametrically. This research investigates whether MNL structure of inter-brand competition is a reasonable assumption, so that when the utility function is estimated nonparametrically, the IIA assumption does not bias the result. For this purpose, the authors compare the performance of two comparable nonparametric choice models that differ in one aspect: one assumes MNL competitive structure and the other infers the pattern of brands’ competition nonparametrically from data.
Dimension reduction in functional regression with categorical predictor
Computational Statistics - Tập 32 - Trang 585-609 - 2016
In the present paper, we consider dimension reduction methods for functional regression with a scalar response and the predictors including a random curve and a categorical random variable. To deal with the categorical random variable, we propose three potential dimension reduction methods: partial functional sliced inverse regression, marginal functional sliced inverse regression and conditional functional sliced inverse regression. Furthermore, we investigate the relationships among the three methods. In addition, a new modified BIC criterion for determining the dimension of the effective dimension reduction space is developed. Real and simulation data examples are then presented to show the effectiveness of the proposed methods.
Simultaneous Bayesian modelling of skew-normal longitudinal measurements with non-ignorable dropout
Computational Statistics - - 2022
Testing heterogeneity in quantile regression: a multigroup approach
Computational Statistics - - Trang 1-24 - 2023
The paper aims to introduce a multigroup approach to assess group effects in quantile regression. The procedure estimates the same regression model at different quantiles, and for different groups of observations. Such groups are defined by the levels of one or more stratification variables. The proposed approach exploits a computational procedure to test group effects. In particular, a bootstrap parametric test and a permutation test are compared through artificial data taking into account different sample sizes, and comparing their performance in detecting low, medium, and high differences among coefficients pertaining different groups. An empirical analysis on MOOC students’ performance is used to show the proposal in action. The effect of the two main drivers impacting on performance, learning and engagement, is explored at different conditional quantiles, and comparing self-paced courses with instructor-paced courses, offered on the EdX platform.
Tính toán và phân tích trong lựa chọn mô hình hồi quy robust sử dụng độ phức tạp ngẫu nhiên Dịch bởi AI
Computational Statistics - Tập 14 - Trang 293-314 - 1999
Trong bài báo này, chúng tôi nghiên cứu một phương pháp độ phức tạp ngẫu nhiên để lựa chọn mô hình trong hồi quy tuyến tính robust. Các khía cạnh tính toán và ứng dụng của phương pháp này là trọng tâm của nghiên cứu. Cụ thể, chúng tôi cung cấp cả quy trình và một gói chương trình ngôn ngữ S để tính toán độ phức tạp ngẫu nhiên và tiến hành chọn lựa mô hình liên quan. Mặt khác, chúng tôi thảo luận về cách một phân phối xác suất trên tập hợp các mô hình candidate có thể được sinh ra bởi độ phức tạp ngẫu nhiên và cách phân phối này có thể được sử dụng trong chẩn đoán để đo lường xác suất mà một mô hình candidate được chọn. Chúng tôi cũng thảo luận về một số chiến lược lựa chọn mô hình khi có nhiều biến giải thích tiềm năng. Cuối cùng, các ví dụ và một nghiên cứu mô phỏng được trình bày để đánh giá hiệu suất mẫu hữu hạn của các phương pháp của chúng tôi.
#Hồi quy tuyến tính robust #độ phức tạp ngẫu nhiên #lựa chọn mô hình #phân phối xác suất #biến giải thích tiềm năng.
Tổng số: 1,140
- 1
- 2
- 3
- 4
- 5
- 6
- 10