Supervised Feature Selection via Quadratic Surface Regression with $$l_{2,1}$$ -Norm Regularization

Annals of Data Science - Trang 1-29 - 2024
Changlin Wang1,2, Zhixia Yang1,2, Junyou Ye1,2, Xue Yang1,2, Manchen Ding1,2
1College of Mathematics and Systems Science, Xinjiang University, Urumuqi, China
2Institute of Mathematics and Physics, Xinjiang University, Urumuqi, China

Tóm tắt

This paper proposes a supervised kernel-free quadratic surface regression method for feature selection (QSR-FS). The method is to find a quadratic function in each class and incorporates it into the least squares loss function. The $$l_{2,1}$$ -norm regularization term is introduced to obtain a sparse solution, and a feature weight vector is constructed by the coefficients of the quadratic functions in all classes to explain the importance of each feature. An alternating iteration algorithm is designed to solve the optimization problem of this model. The computational complexity of the algorithm is provided, and the iterative formula is reformulated to further accelerate computation. In the experimental part, feature selection and its downstream classification tasks are performed on eight datasets from different domains, and the experimental results are analyzed by relevant evaluation index. Furthermore, feature selection interpretability and parameter sensitivity analysis are provided. The experimental results demonstrate the feasibility and effectiveness of our method.

Tài liệu tham khảo

Guyon I, André E (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182 Gui J, Sun ZN, Ji SW, Tao DC, Tan TN (2017) Feature selection based on structured sparsity: a comprehensive study. IEEE Trans Neural Netw Learn Syst 28(7):1490–1507 Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156 Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5(12):1205–1224 Duan MX, Li KL, Liao XK (2018) A parallel multi-classification algorithm for big data using an extreme learning machine. IEEE Trans Neural Netw Learn Syst 29(6):2337–2351 Estévez PA, Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE Trans Neural Netw Learn Syst 20(2):189–201 Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. ECML-94, pp 171-182 Malina W (1981) On an extended Fisher criterion for feature selection. IEEE Trans Pattern Anal Mach Intell 3(5):611–614 Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238 Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31(14):2225–2236 Hastie T, Tibshirani R, Buja A (1993) Flexible discriminant analysis by optimal scoring. J Am Stat Assoc 89(428):1255–1270 Hastie T, Tibshirani R, Friedman J (2009) Linear methods for classification. The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York, pp 103–106 Fu S, Tian Y, Tang L (2023) Robust regression under the general framework of bounded loss functions. Eur J Oper Res 310(3):1325–1339 Nie FP, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint \(l_{2, 1}\)-norms minimization. Adv Neural Inf Process Syst 2:1813–1821 Wang C, Chen X, Yuan G (2021) Semisupervised feature selection with sparse discriminative least squares regression. IEEE Trans Cybern 52(8):8413–8424 Zhao S, Zhang B, Li S (2020) Discriminant and sparsity based least squares regression with \(l_{1}\) regularization for feature representation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Spain, pp 1504–1508 Zhao SP, Wu JG, Zhang B, Fei LK (2022) Low-rank inter-class sparsity based semi-flexible target least squares regression for feature representation. Pattern Recogn 123:108346 Dagher I (2008) Quadratic kernel-free nonlinear support vector machine. J Global Optim 41(1):15–30 Cortes C, Vapnik V (1995) Support vector machine. Mach learn 3(20):273–297 Luo J, Fang SC, Deng ZB, Guo XL (2016) Soft quadratic surface support vector machine for binary classification. Asia Pac J Oper Res 33(06):1650046 Bai YQ, Han X, Chen T, Yu H (2015) Quadratic kernel-free least squares support vector machine for target diseases classification. J Comb Optim 30(4):850–870 Mousavi A, Gao ZM, Han LS, Lim A (2022) Quadratic surface support vector machine with \(L_{1}\) norm regularization. J Ind Manag Optim 18(3):1835–1861 Zhan YR, Bai YQ, Zhang W, Ying SH (2018) A p-admm for sparse quadratic kernel-free least squares semi-supervised support vector machine. Neurocomputing 306:37–50 Gao ZM, Fang SC, Gao X, Luo J, Medhin N (2021) A novel kernel-free least squares twin support vector machine for fast and accurate multi-class classification. Knowl Based Syst 226:107123 Liu DL, Shi Y, Tian YJ, Huang XK (2016) Ramp loss least squares support vector machine. J Comput Sci 14:61–68 Yan X, Bai YQ, Fang S, Luo J (2018) A proximal quadratic surface support vector machine for semi-supervised binary classification. Soft Comput 22(20):6905–6919 Ye JY, Yang ZX, Li ZL (2021) Quadratic hyper-surface kernel-free least squares support vector regression. Intell Data Anal 25(2):265–281 Luo J, Tian YJ, Yan X (2017) Clustering via fuzzy one-class quadratic surface support vector machine. Soft Comput 21(19):5859–5865 Gao ZM, Wang YW, Huang M, Luo J, Tang SS (2022) A kernel-free fuzzy reduced quadratic surface \(\nu \)-support vector machine with applications. Appl Soft Comput 127:109390 Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227 Zhao H, Yu SL (2019) Cost-sensitive feature selection via the \(l_{2,1}\)-norm. Int J Approx Reason 104:25–37 Peng YL, Sehdev P, Liu SG, Li J, Wang XL (2018) \(l_{2,1}\)-norm minimization based negative label relaxation linear regression for feature selection. Pattern Recognit Lett 116:170–178 Du X, Nie FP, Wang W, Yang Y, Zhou X (2019) Exploiting combination effect for unsupervised feature selection by \(l_{2,0}\) norm. IEEE Trans Neural Netw Learn Syst 30(1):201–214 Fan M, Zhang X, Hu J, Gu N, Tao D (2022) Adaptive data structure regularized multiclass discriminative feature selection. IEEE Trans Neural Netw Learn Syst 33(10):5859–5872 Nie FP, Wang Z, Tian L, Wang R, Li X (2022) Subspace sparse discriminative feature selection. IEEE Trans Cybern 52(6):4221–4233 Zhang H, Gong MG, Nie FP, Li XL (2022) Unified dual-label semi-supervised learning with top-k feature selection. Neurocomputing 501:875–888 Shen HT, Zhu Y, Zheng W, Zhu X (2021) Half-Quadratic Minimization for Unsupervised Feature Selection on Incomplete Data. IEEE Trans Neural Netw Learn Syst 32(7):3122–3135 He XF, Cai D, Niyogi P (2006) Laplacian score for feature selection. NIPS 05:507–514 Demsar J (2006) Statistical comparisons of classifiers over multiple datasets. J Mach Learn Res 7:1–30 Garciía S, Fernández A, Luengo J, Francisco H (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064