Classification based on multivariate mixed type longitudinal data with an application to the EU-SILC databaseAdvances in Data Analysis and Classification - Tập 17 - Trang 369-406 - 2022
Jan Vávra, Arnošt Komárek
Although many present day studies gather data of a diverse nature (numeric
quantities, binary indicators or ordered categories) on the same units
repeatedly over time, there only exist limited number of approaches in the
literature to analyse so-called mixed-type longitudinal data. We present
a statistical model capable of joint modelling several mixed-type outcomes,
which also accounts for possib... hiện toàn bộ
Bayesian nonstationary Gaussian process models via treed process convolutionsAdvances in Data Analysis and Classification - Tập 13 - Trang 797-818 - 2018
Waley W. J. Liang, Herbert K. H. Lee
The Gaussian process is a common model in a wide variety of applications, such
as environmental modeling, computer experiments, and geology. Two major
challenges often arise: First, assuming that the process of interest is
stationary over the entire domain often proves to be untenable. Second, the
traditional Gaussian process model formulation is computationally inefficient
for large datasets. In ... hiện toàn bộ
How many data clusters are in the Galaxy data set?Advances in Data Analysis and Classification - Tập 16 - Trang 325-349 - 2021
Bettina Grün, Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter
In model-based clustering, the Galaxy data set is often used as a benchmark data
set to study the performance of different modeling approaches. Aitkin (Stat
Model 1:287–304) compares maximum likelihood and Bayesian analyses of the Galaxy
data set and expresses reservations about the Bayesian approach due to the fact
that the prior assumptions imposed remain rather obscure while playing a major
rol... hiện toàn bộ
Convex clustering for binary dataAdvances in Data Analysis and Classification - Tập 13 - Trang 991-1018 - 2018
Hosik Choi, Seokho Lee
We present a new clustering algorithm for multivariate binary data. The new
algorithm is based on the convex relaxation of hierarchical clustering, which is
achieved by considering the binomial likelihood as a natural distribution for
binary data and by formulating convex clustering using a pairwise penalty on
prototypes of clusters. Under convex clustering, we show that the typical $$\ell
_1$$ pa... hiện toàn bộ
Benchmarking distance-based partitioning methods for mixed-type dataAdvances in Data Analysis and Classification - Tập 17 - Trang 701-724 - 2022
Efthymios Costa, Ioanna Papatsouma, Angelos Markos
Clustering mixed-type data, that is, observation by variable data that consist
of both continuous and categorical variables poses novel challenges. Foremost
among these challenges is the choice of the most appropriate clustering method
for the data. This paper presents a benchmarking study comparing eight
distance-based partitioning methods for mixed-type data in terms of cluster
recovery performa... hiện toàn bộ
Variational inference for semiparametric Bayesian novelty detection in large datasetsAdvances in Data Analysis and Classification - - Trang 1-23 - 2023
Luca Benedetti, Eric Boniardi, Leonardo Chiani, Jacopo Ghirri, Marta Mastropietro, Andrea Cappozzo, Francesco Denti
After being trained on a fully-labeled training set, where the observations are
grouped into a certain number of known classes, novelty detection methods aim to
classify the instances of an unlabeled test set while allowing for the presence
of previously unseen classes. These models are valuable in many areas, ranging
from social network and food adulteration analyses to biology, where an evolving... hiện toàn bộ
EditorialAdvances in Data Analysis and Classification - - 2011
Hans‐Hermann Bock, Wolfgang Gaul, Akinori Okada, Maurizio Vichi
Parsimonious cluster systemsAdvances in Data Analysis and Classification - Tập 3 - Trang 189-204 - 2009
François Brucker, Alain Gély
We introduce in this paper a new clustering structure, parsimonious cluster
systems, which generalizes phylogenetic trees. We characterize it as the set of
hypertrees stable under restriction and prove that this set is in bijection with
a known dissimilarity model: chordal quasi-ultrametrics. We then present one
possible way to graphically represent elements of this model.
Multiple imputation in principal component analysisAdvances in Data Analysis and Classification - Tập 5 - Trang 231-246 - 2011
Julie Josse, Jérôme Pagès, François Husson
The available methods to handle missing values in principal component analysis
only provide point estimates of the parameters (axes and components) and
estimates of the missing values. To take into account the variability due to
missing values a multiple imputation method is proposed. First a method to
generate multiple imputed data sets from a principal component analysis model is
defined. Then, ... hiện toàn bộ