One-Pass Additive-Error Subset Selection for $$\ell _{p}$$ Subspace Approximation and (k, p)-Clustering
Tóm tắt
We consider the problem of subset selection for
$$\ell _{p}$$
subspace approximation and (k, p)-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For
$$\ell _{p}$$
subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of
$$p \in [1, \infty )$$
, require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for
$$\ell _{p}$$
subspace approximation, for any
$$p \in [1, \infty )$$
. Earlier subset selection algorithms that give a one-pass multiplicative
$$(1+\epsilon )$$
approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative
$$(1+\epsilon )$$
approximation guarantee for the special case of
$$\ell _{2}$$
subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass noisy subset selection with
$$(1+\epsilon )$$
approximation guarantee for
$$\ell _{p}$$
subspace approximation when
$$p \in \{1, 2\}$$
. Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any
$$p \in [1, \infty )$$
. We also focus on (k, p)-clustering, where the task is to group the data points into k clusters such that the sum of distances from points to cluster centers (raised to the power p) is minimized for
$$p\in [1, \infty )$$
. The subset selection algorithms are based on
$$D^p$$
sampling due to Wei (NIPS’16, 2016) which is an extension of
$$D^2$$
sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the
$$D^p$$
sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for (k, p)-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for k-means for
$$p=2$$
; however, our result gives a solution for a more generic problem when
$$p \in [1,\infty )$$
. At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability.
Từ khóa
Tài liệu tham khảo
Feldman, D.: Introduction to Core-sets: an Updated Survey (2020)
Frieze, A.M., Kannan, R., Vempala, S.S.: Fast monte-carlo algorithms for finding low-rank approximations. J. ACM 51(6), 1025–1041 (2004). https://doi.org/10.1145/1039488.1039494
Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1954–1963 (2015)
Boutsidis, C., Mahoney, M.W., Drineas, P.: An improved approximation algorithm for the column subset selection problem. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 968–977 (2009). SIAM
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Relative-error CUR matrix decompositions. SIAM J. Matrix Anal. Appl. 30(2), 844–881 (2008). https://doi.org/10.1137/07070471X
Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. USA 106(3), 697–702 (2009). https://doi.org/10.1073/pnas.0803205106
Wang, S., Zhang, Z.: Improving CUR matrix decomposition and the nyström approximation via adaptive sampling. J. Mach. Learn. Res. 14(1), 2729–2769 (2013)
Mahoney, M.W.: Randomized algorithms for matrices and data. Found. Trends Mach. Learn. 3(2), 123–224 (2011). https://doi.org/10.1561/2200000035
Ida, Y., Kanai, S., Fujiwara, Y., Iwata, T., Takeuchi, K., Kashima, H.: Fast deterministic cur matrix decomposition with accuracy assurance. In: International Conference on Machine Learning, pp. 4594–4603 (2020). PMLR
Sun, J., Xie, Y., Zhang, H., Faloutsos, C.: Less is more: Compact matrix decomposition for large sparse graphs. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 366–377 (2007). SIAM
Mahoney, M.W., Maggioni, M., Drineas, P.: Tensor-cur decompositions for tensor-based data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 327–336 (2006)
Drineas, P., Kerenidis, I., Raghavan, P.: Competitive recommendation systems. In: Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, pp. 82–90 (2002)
Deshpande, A., Vempala, S.: Adaptive sampling and fast low-rank matrix approximation. In: Proceedings of the 9th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th International Conference on Randomization and Computation. APPROX’06/RANDOM’06, pp. 292–303. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11830924_28
Deshpande, A., Rademacher, L., Vempala, S.S., Wang, G.: Matrix approximation and projective clustering via volume sampling. Theory Comput. 2(12), 225–247 (2006). https://doi.org/10.4086/toc.2006.v002a012
Guruswami, V., Sinop, A.K.: Optimal column-based low-rank matrix reconstruction. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’12, pp. 1207–1214. Society for Industrial and Applied Mathematics, USA (2012)
Derezinski, M., Warmuth, M.K.: Unbiased estimates for linear regression via volume sampling. Advances in Neural Information Processing Systems 30 (2017)
Broadbent, M.E., Brown, M., Penner, K., Ipsen, I., Rehman, R.: Subset selection algorithms: Randomized vs. deterministic. SIAM Undergraduate Research Online 3(01) (2010)
Dan, C., Wang, H., Zhang, H., Zhou, Y., Ravikumar, P.K.: Optimal analysis of subset-selection based \(\ell _p\) low-rank approximation. Advances in Neural Information Processing Systems 32 (2019)
Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Inf. Theor. 28(2), 129–137 (2006). https://doi.org/10.1109/TIT.1982.1056489
Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’07, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). https://doi.org/10.14778/2180912.2180915
Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate k-means++ in sublinear time. Proceedings of the AAAI Conference on Artificial Intelligence 30(1) (2016). https://doi.org/10.1609/aaai.v30i1.10259
Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Fast and provably good seedings for k-means. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 55–63. Curran Associates Inc., Red Hook (2016)
Deshpande, A., Varadarajan, K.: Sampling-based dimension reduction for subspace approximation. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing. STOC ’07, pp. 641–650. Association for Computing Machinery, New York, (2007). https://doi.org/10.1145/1250790.1250884
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: Np-hardness of euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009). https://doi.org/10.1007/s10994-009-5103-0
Mahajan, M., Nimbhorkar, P., Varadarajan, K.R.: The planar k-means problem is np-hard. Theor. Comput. Sci. 442, 13–21 (2012). https://doi.org/10.1016/j.tcs.2010.05.034
Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering, pp. 15–28 (2009). https://doi.org/10.1007/978-3-642-03685-9_2
Wei, D.: A constant-factor bi-criteria approximation guarantee for k-means++. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 604–612. Curran Associates Inc., Red Hook (2016)
Jaiswal, R., Kumar, M., Yadav, P.: Improved analysis of d\({}^{\text{2 }}\)-sampling based PTAS for k-means and other clustering problems. Inf. Process. Lett. 115(2), 100–103 (2015). https://doi.org/10.1016/j.ipl.2014.07.009
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time \((1+\epsilon )\)-approximation algorithm for \(k\)-means clustering in any dimensions. In: 45th Annual IEEE Symposium on Foundations of Computer Science, pp. 454–462 (2004). IEEE
Cohen, M.B., Musco, C., Musco, C.: Input sparsity time low-rank approximation via ridge leverage score sampling. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’17, pp. 1758–1777. Society for Industrial and Applied Mathematics, USA (2017)
Mahabadi, S., Razenshteyn, I., Woodruff, D.P., Zhou, S.: Non-adaptive adaptive sampling on turnstile streams. In: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. STOC 2020, pp. 1251–1264. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3357713.3384331
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985). https://doi.org/10.1145/3147.3165
Efraimidis, P.S., Spirakis, P.P.: Weighted random sampling. In: Encyclopedia of Algorithms, pp. 2365–2367 (2016). https://doi.org/10.1007/978-1-4939-2864-4_478
Ghashami, M., Liberty, E., Phillips, J.M., Woodruff, D.P.: Frequent directions : Simple and deterministic matrix sketching. CoRR abs/1501.01711 (2015) arXiv:1501.01711
Braverman, V., Drineas, P., Musco, C., Musco, C., Upadhyay, J., Woodruff, D.P., Zhou, S.: Near optimal linear algebra in the online and sliding window models. In: 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pp. 517–528 (2020). IEEE
Ban, F., Bhattiprolu, V., Bringmann, K., Kolev, P., Lee, E., Woodruff, D.P.: A ptas for \(\ell _p\)-low rank approximation. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 747–766 (2019). SIAM
Chierichetti, F., Gollapudi, S., Kumar, R., Lattanzi, S., Panigrahy, R., Woodruff, D.P.: Algorithms for \(\ell _p\) low-rank approximation. In: International Conference on Machine Learning, pp. 806–814 (2017). PMLR
Liberty, E.: Simple and deterministic matrix sketching. In:Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’13, pp. 581–588. Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2487575.2487623
Ghashami, M., Phillips, J.M.: Relative errors for deterministic low-rank matrix approximations. In:Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’14, pp. 707–717. Society for Industrial and Applied Mathematics, USA (2014)
Ghashami, M., Liberty, E., Phillips, J.M., Woodruff, D.P.: Frequent directions: Simple and deterministic matrix sketching. SIAM J. Comput. 45(5), 1762–1792 (2016). https://doi.org/10.1137/15M1009718
Cormode, G., Dickens, C., Woodruff, D.: Leveraging well-conditioned bases: Streaming and distributed summaries in minkowski p-norms. In: International Conference on Machine Learning, pp. 1048–1056 (2018)
Clarkson, K.L., Woodruff, D.P.: Low-rank approximation and regression in input sparsity time. J. ACM 63(6) (2017). https://doi.org/10.1145/3019134
Feldman, D., Monemizadeh, M., Sohler, C., Woodruff, D.P.: Coresets and sketches for high dimensional subspace approximation problems. In: Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 630–649 (2010). SIAM
Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. Adv Neural Inform Process Syst22 (2009)
Jaiswal, R., Kumar, A., Sen, S.: A simple D 2-sampling based PTAS for k-means and other clustering problems. Algorithmica 70(1), 22–46 (2014). https://doi.org/10.1007/s00453-013-9833-9
Anari, N., Gharan, S.O., Rezaei, A.: Monte carlo markov chain algorithms for sampling strongly rayleigh distributions and determinantal point processes. In: Conference on Learning Theory, pp. 103–115 (2016). PMLR
CHAO, M.T.: A general purpose unequal probability sampling plan. Biometrika 69(3), 653–656 (1982) https://academic.oup.com/biomet/article-pdf/69/3/653/591311/69-3-653.pdf. https://doi.org/10.1093/biomet/69.3.653
Cai, H.: Exact bound for the convergence of metropolis chains. Stochastic Anal Appl 18(1), 63–71 (2000). https://doi.org/10.1080/07362990008809654