Journal of Classification

  0176-4268

  1432-1343

  Mỹ

Cơ quản chủ quản:  Springer New York , SPRINGER

Lĩnh vực:
Statistics, Probability and UncertaintyLibrary and Information SciencesMathematics (miscellaneous)Psychology (miscellaneous)

Phân tích ảnh hưởng

Thông tin về tạp chí

 

To publish original and valuable papers in the field of classification, numerical taxonomy, multidimensional scaling and other ordination techniques, clustering, tree structures and other network models (with somewhat less emphasis on principal components analysis, factor analysis, and discriminant analysis), as well as associated models and algorithms for fitting them. Articles will support advances in methodology while demonstrating compelling substantive applications. Comprehensive review articles are also acceptable. Contributions will represent disciplines such as statistics, psychology, biology, information retrieval, anthropology, archeology, astronomy, business, chemistry, computer science, economics, engineering, geography, geology, linguistics, marketing, mathematics, medicine, political science, psychiatry, sociology, and soil science.

Các bài báo tiêu biểu

Asking Infinite Voters ‘Who is a J?’: Group Identification Problems in $\mathbb {N}$
Tập 37 - Trang 58-65 - 2019
Federico Fioravanti, Fernando Tohmé
We analyze the problem of classifying individuals in a group N taking into account their opinions about which of them should belong to a specific subgroup N′⊆ N, in the case that |N| > ∞. We show that this problem is relevant in cases in which the group changes in time and/or is subject to uncertainty. The approach followed here to find the ensuing classification is by means of a Collective Identity Function (CIF) that maps the set of opinions into a subset of N. Kasher and Rubinstein (Logique & Analyse, 160, 385–395 1997) characterized different CIFs axiomatically when |N| < ∞, in particular, the Liberal and Oligarchic aggregators. We show that in the infinite setting, the liberal result is still valid but the result no longer holds for the oligarchic case and give a characterization of all the aggregators satisfying the same axioms as the Oligarchic CIF. In our motivating examples, the solution obtained according to the alternative CIF is most cogent.
Structural Similarity: Spectral Methods for Relaxed Blockmodeling
Tập 27 - Trang 279-306 - 2010
Ulrik Brandes, Jürgen Lerner
In this paper we propose the concept of structural similarity as a relaxation of blockmodeling in social network analysis. Most previous approaches attempt to relax the constraints on partitions, for instance, that of being a structural or regular equivalence to being approximately structural or regular, respectively. In contrast, our approach is to relax the partitions themselves: structural similarities yield similarity values instead of equivalence or non-equivalence of actors, while strictly obeying the requirement made for exact regular equivalences. Structural similarities are based on a vector space interpretation and yield efficient spectral methods that, in a more restrictive manner, have been successfully applied to difficult combinatorial problems such as graph coloring. While traditional blockmodeling approaches have to rely on local search heuristics, our framework yields algorithms that are provably optimal for specific data-generation models. Furthermore, the stability of structural similarities can be well characterized making them suitable for the analysis of noisy or dynamically changing network data.
The generation of random ultrametric matrices representing dendrograms
Tập 8 - Trang 177-200 - 1991
François-Joseph Lapointe, Pierre Legendre
Many methods and algorithms to generate random trees of many kinds have been proposed in the literature. No procedure exists however for the generation of dendrograms with randomized fusion levels. Randomized dendrograms can be obtained by randomizing the associated cophenetic matrix. Two algorithms are described. The first one generates completely random dendrograms, i.e., trees with a random topology, random fusion level values, and random assignment of the labels. The second algorithm uses a double-permutation procedure to randomize a given dendrogram; it proceeds by randomization of the fixed fusion levels, instead of using random fusion level values. A proof is presented that the double-permutation procedure is a Uniform Random Generation Algorithmsensu Furnas (1984), and a complete example is given.
Editorial
Tập 23 - Trang 1-2 - 2006
Improved Classification for Compositional Data Using the α-transformation
Tập 33 - Trang 243-261 - 2016
Michail Tsagris, Simon Preston, Andrew T. A. Wood
In compositional data analysis, an observation is a vector containing nonnegative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centers on the idea of using the α-transformation to transform the data and then to classify the transformed data via regularized discriminant analysis and the k-nearest neighbors algorithm. Using the α-transformation generalizes two rival approaches in compositional data analysis, one (when α=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α = 0) that employs Aitchison’s centered log-ratio transformation. A numerical study with several real datasets shows that whether using α = 1 or α = 0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α can sometimes give better performance than using either 1 or 0.
Exploratory Visual Inspection of Category Associations and Correlation Estimation in Multidimensional Subspaces
Tập 36 - Trang 177-199 - 2018
Se-Kang Kim, Joseph H. Grochowalski
In this paper, we aimed to estimate associations among categories in a multi-way contingency table. To simplify estimation and interpretation of results, we stacked multiple variables to form a two-way stacked table and analyzed it using the biplot in correspondence analysis (CA) paradigm. The correspondence analysis biplot allowed visual inspection of category associations in a twodimensional plane, and the CA solution numerically estimated the category relationships. We utilized parallel analysis and identified two statistically meaningful dimensions with which a plane was constructed. In the plane, we examined metric space mapping, which was converted into correlations, between school districts and categories of school-relevant variables. The results showed differential correlation patterns among school districts and this correlational information may be useful for stake holders or policy makers to pinpoint possible causes of low school performance and school-relevant behaviors.
Inferential Tools for Assessing Dependence Across Response Categories in Multinomial Models with Discrete Random Effects
Chiara Masci, Francesca Ieva, Anna Maria Paganoni
AbstractWe propose a discrete random effects multinomial regression model to deal with estimation and inference issues in the case of categorical and hierarchical data. Random effects are assumed to follow a discrete distribution with an a priori unknown number of support points. For a K-categories response, the modelling identifies a latent structure at the highest level of grouping, where groups are clustered into subpopulations. This model does not assume the independence across random effects relative to different response categories, and this provides an improvement from the multinomial semi-parametric multilevel model previously proposed in the literature. Since the category-specific random effects arise from the same subjects, the independence assumption is seldom verified in real data. To evaluate the improvements provided by the proposed model, we reproduce simulation and case studies of the literature, highlighting the strength of the method in properly modelling the real data structure and the advantages that taking into account the data dependence structure offers.
Classification and Categorical Inputs with Treed Gaussian Process Models
Tập 28 - Trang 244-270 - 2011
Tamara Broderick, Robert B. Gramacy
Recognizing the successes of treed Gaussian process (TGP) models as an interpretable and thrifty model for nonparametric regression, we seek to extend the model to classification. Both treed models and Gaussian processes (GPs) have, separately, enjoyed great success in application to classification problems. An example of the former is Bayesian CART. In the latter, real-valued GP output may be utilized for classification via latent variables, which provide classification rules by means of a softmax function. We formulate a Bayesian model averaging scheme to combine these two models and describe a Monte Carlo method for sampling from the full posterior distribution with joint proposals for the tree topology and the GP parameters corresponding to latent variables at the leaves. We concentrate on efficient sampling of the latent variables, which is important to obtain good mixing in the expanded parameter space. The tree structure is particularly helpful for this task and also for developing an efficient scheme for handling categorical predictors, which commonly arise in classification problems. Our proposed classification TGP (CTGP) methodology is illustrated on a collection of synthetic and real data sets. We assess performance relative to existing methods and thereby show how CTGP is highly flexible, offers tractable inference, produces rules that are easy to interpret, and performs well out of sample.
On the Incommensurability Phenomenon
Tập 33 - Trang 185-209 - 2016
Donniell E. Fishkind, Cencheng Shen, Youngser Park, Carey E. Priebe
Suppose that two large, multi-dimensional data sets are each noisy measurements of the same underlying random process, and principal components analysis is performed separately on the data sets to reduce their dimensionality. In some circumstances it may happen that the two lower-dimensional data sets have an inordinately large Procrustean fitting-error between them. The purpose of this manuscript is to quantify this “incommensurability phenomenon”. In particular, under specified conditions, the square Procrustean fitting-error of the two normalized lower-dimensional data sets is (asymptotically) a convex combination (via a correlation parameter) of the Hausdorff distance between the projection subspaces and the maximum possible value of the square Procrustean fitting-error for normalized data. We show how this gives rise to the incommensurability phenomenon, and we employ illustrative simulations and also use real data to explore how the incommensurability phenomenon may have an appreciable impact.