An attention algorithm for solving large scale structured $$l_{0}$$-norm penalty estimation problems

Tso-Jung Yen1, Yu-Min Yen2
1Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
2Department of International Business, National Chengchi University, Taipei, Taiwan

Tóm tắt

Từ khóa


Tài liệu tham khảo

Attouch, H., Bolte, J., & Svaiter, B. F. (2013). Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical Programming: Series A, 137, 91–129.

Bauschke, H. H., & Combettes, P. L. (2010). Convex Analysis and Monotone Operator Theory in Hilbert Spaces. New York: Springer.

Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183–202.

Beck, A., & Tetruashvili, L. (2013). On the convergence of block coordinate descent type methods. SIAM Journal of Optimization, 23, 2037–2060.

Bertsekas, D. (1999). Nonlinear Programming. Massachusetts: Athena Scientific.

Bertsekas, D. (2011). Incremental proximal methods for large scale convex optimization. Mathematical Programming: Series B, 129, 163–195.

Bertsekas, D. (2015). Convex Optimization Algorithms. Massachusetts: Athena Scientific.

Bertsimas, D., King, A., & Mazumder, R. (2016). Best subset selection via a modern optimization lens. Annals of Statistics, 44, 813–852.

Blumensath, T., & Davies, M. E. (2009). Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, 27, 265–274.

Bolte, J., Daniilidis, A., & Lewis, A. (2007). The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal of Optimization, 17, 1205–1223.

Bolte, J., Daniilidis, A., Ley, O., & Mazet, L. (2010). Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Transactions of the American Mathematical Society, 362, 3319–3363.

Bolte, J., Sabach, S., & Teboulle, M. (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming: Series A, 146, 459–494.

Combettes, P. L., & Pesquet, J.-C. (2011). Proximal splitting methods in signal processing. In H. H. Bauschke, R. S. Burachik, P. L. Combettes, V. Elser, D. R. Luke, & H. Wolkowicz (Eds.), Fixed-Point Algorithms for Inverse Problems in Science and Engineering (pp. 185–212). New York: Springer.

Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849–911.

Fercoq, O., & Richtárik, P. (2015). Accelerated, parallel, and proximal coordinate descent. SIAM Journal of Optimization, 25, 1997–2023.

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1–22.

Hastie, T., Tibshirani, R., & Tibshirani, R.J. (2017). Extended comparisons of best subset selection, forward stepwise selection, and the lasso: following “best subset selection from a modern optimization lens” by Bertsimas, King, and Mazumder. arXiv:1707.08692v2.

Luss, R., & Teboulle, M. (2013). Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Review, 55, 65–98.

Ma, C., Koneĉný, J., Jaggi, M., Smith, V., Jordan, M. I., Richtárik, P., et al. (2017). Distributed optimization with arbitrary local solvers. Optimization Methods & Software, 32, 813–848.

Marjanovic, G., Ulfarsson, M.O., & Hero III, A.O. (2015). Hero III. MIST: $$l_{0}$$ sparse linear regression with momentum. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Moreau, J.-J. (1962). Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes Rendus de l Académie des Sciences A Mathematics, 255, 2897–2899.

Needell, D., Srebro, N., & Ward, R. (2016). Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Mathematical Programming: Series A, 155, 549–573.

Nemirovski, A., Juditsky, A., Lan, G., & Shapiro, A. (2009). Robust stochastic approximation approach to stochastic programming. SIAM Journal of Optimization, 19, 1574–1609.

Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. New York: Kluwer Academic Publisher.

Nesterov, Y. (2012). Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal of Optimization, 22, 341–362.

Onose, A., & Dumitrescu, B. (2015). Adaptive randomized coordinate descent for sparse systems: Lasso and greedy algorithms. IEEE Transactions on Signal Processing, 63, 4091–4101.

Parikh, N., & Boyd, S. (2013). Proximal algorithms. Foundations and Trends in Optimization, 3, 123–231.

Qu, Z., & Richtárik, P. (2016). Coordinate descent with arbitrary sampling I: algorithms and complexity. Optimization Methods and Software, 31, 829–857.

Richtárik, P., & Takáĉ, M. (2016). Parallel coordinate descent methods for big data optimization. Mathematical Programming: Series A, 156, 433–484.

Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400–407.

Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011). Regularization paths for Cox’s proportional hazards model via coordinate descent. Journal of Statistical Software, 39, 1–13.

Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., et al. (2012). Strong rules for discarding predictors in lasso-type problems. Journal of the Royal Statistical Society: Series B, 74, 245–266.

Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109, 475–494.

Wright, S. J., Nowak, R. D., & Figueiredo, M. A. T. (2009). Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57, 2479–2493.

Yen, T.-J. (2011). A majorization-minimization approach to variable selection using spike and slab priors. The Annals of Statistics, 39, 1748–1775.

Yen, T.-J., & Yen, Y.-M. (2016). Structured variable selection via prior-induced hierarchical penalty functions. Computational Statistics and Data Analysis, 96, 87–103.