Machine Learning

Công bố khoa học tiêu biểu

* Dữ liệu chỉ mang tính chất tham khảo

Sắp xếp:  
Supersparse linear integer models for optimized medical scoring systems
Machine Learning - - 2016
Berk Ustun, Cynthia Rudin
Stacked Regressions
Machine Learning - Tập 24 - Trang 49-64 - 1996
Leo Breiman
Stacking regressions is a method for forming linear combinations of different predictors to give improved prediction accuracy. The idea is to use cross-validation data and least squares under non negativity constraints to determine the coefficients in the combination. Its effectiveness is demonstrated in stacking regression trees of different sizes and in a simulation stacking linear subset and ridge regressions. Reasons why this method works are explored. The idea of stacking originated with Wolpert (1992).
Editorial: Inductive Logic Programming is Coming of Age
Machine Learning - Tập 44 - Trang 207-209 - 2001
Peter Flach, Sašo Džeroski
Editorial: Advice to Machine Learning Authors
Machine Learning - Tập 5 - Trang 233-237 - 1990
Pat Langley
A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing
Machine Learning - Tập 99 - Trang 137-163 - 2014
Karthik Devarajan, Guoli Wang, Nader Ebrahimi
Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix $$V$$ into the product of two nonnegative matrices, $$W$$ and $$H$$ , such that $$V \sim WH$$ . It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi’s divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for $$W$$ and $$H$$ . In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.
Selective Sampling for Nearest Neighbor Classifiers
Machine Learning - Tập 54 - Trang 125-152 - 2004
Michael Lindenbaum, Shaul Markovitch, Dmitry Rusakov
Most existing inductive learning algorithms work under the assumption that their training examples are already tagged. There are domains, however, where the tagging procedure requires significant computation resources or manual labor. In such cases, it may be beneficial for the learner to be active, intelligently selecting the examples for labeling with the goal of reducing the labeling cost. In this paper we present LSS—a lookahead algorithm for selective sampling of examples for nearest neighbor classifiers. The algorithm is looking for the example with the highest utility, taking its effect on the resulting classifier into account. Computing the expected utility of an example requires estimating the probability of its possible labels. We propose to use the random field model for this estimation. The LSS algorithm was evaluated empirically on seven real and artificial data sets, and its performance was compared to other selective sampling algorithms. The experiments show that the proposed algorithm outperforms other methods in terms of average error rate and stability.
Learning Probabilistic Automata and Markov Chains via Queries
Machine Learning - Tập 8 - Trang 151-166 - 1992
Wen-Guey Tzeng
We investigate the problem of learning probabilistic automata and Markov chains via queries in the teacher-student learning model. Probabilistic automata and Markov chains are probabilistic extensions of finite state automata and have similar structures. We discuss some natural oracles associated with probabilistic automata and Markov chains. We present polynomial-time algorithms for learning probabilistic automata and Markov Chains using these oracles.
Announcements
Machine Learning - Tập 5 - Trang 114-115 - 1990
Fast greedy $$\mathcal {C}$$ -bound minimization with guarantees
Machine Learning - Tập 109 - Trang 1945-1986 - 2020
Baptiste Bauvin, Cécile Capponi, Jean-Francis Roy, François Laviolette
The $$\mathcal {C}$$ -bound is a tight bound on the true risk of a majority vote classifier that relies on the individual quality and pairwise disagreement of the voters and provides PAC-Bayesian generalization guarantees. Based on this bound, MinCq is a classification algorithm that returns a dense distribution on a finite set of voters by minimizing it. Introduced later and inspired by boosting, CqBoost uses a column generation approach to build a sparse $$\mathcal {C}$$ -bound optimal distribution on a possibly infinite set of voters. However, both approaches have a high computational learning time because they minimize the $$\mathcal {C}$$ -bound by solving a quadratic program. Yet, one advantage of CqBoost is its experimental ability to provide sparse solutions. In this work, we address the problem of accelerating the $$\mathcal {C}$$ -bound minimization process while keeping the sparsity of the solution and without losing accuracy. We present CB-Boost, a computationally efficient classification algorithm relying on a greedy–boosting-based– $$\mathcal {C}$$ -bound optimization. An in-depth analysis proves the optimality of the greedy minimization process and quantifies the decrease of the $$\mathcal {C}$$ -bound operated by the algorithm. Generalization guarantees are then drawn based on already existing PAC-Bayesian theorems. In addition, we experimentally evaluate the relevance of CB-Boost in terms of the three main properties we expect about it: accuracy, sparsity, and computational efficiency compared to MinCq, CqBoost, Adaboost and other ensemble methods. As observed in these experiments, CB-Boost not only achieves results comparable to the state of the art, but also provides $$\mathcal {C}$$ -bound sub-optimal weights with very few computational demand while keeping the sparsity property of CqBoost.
Tổng số: 1,832   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 10