Thêm tiếng ồn ẩn danh trong không gian con để bảo vệ quyền riêng tư trong khai thác dữ liệu ở dữ liệu liên tục có chiều cao

Peer-to-Peer Networking and Applications - Tập 14 - Trang 1608-1628 - 2021
Shashidhar Virupaksha1,2, Venkatesulu Dondeti1
1Department of CSE, VFSTR Deemed to be University, Guntur, India
2Department of CSE, Presidency University, Bengaluru, India

Tóm tắt

Quyền riêng tư dữ liệu là một mối quan tâm lớn trong khai thác dữ liệu. Các thuật toán khai thác dữ liệu bảo vệ quyền riêng tư đã được sử dụng để bảo vệ quyền riêng tư trong khai thác dữ liệu. Tuy nhiên, việc khai thác dữ liệu bảo vệ quyền riêng tư trên dữ liệu liên tục có chiều cao dẫn đến tổn thất dữ liệu lớn, mất thông tin và việc xác định cụm rất khó khăn. Trong bài báo này, một kỹ thuật mới được đề xuất là Thêm tiếng ồn ẩn danh trong không gian con (ANAS), giúp giảm tổn thất dữ liệu, mất thông tin và nâng cao khả năng xác định các cụm cũng như quyền riêng tư. Việc ẩn danh thông qua tổng hợp được thực hiện trong các không gian con dày và không dày với việc xem xét khoảng cách Euclide để giảm thiểu tổn thất dữ liệu và nâng cao quyền riêng tư. Tiếng ồn ngẫu nhiên trong các giới hạn không gian con được áp dụng vào các không gian con đã ẩn danh để nâng cao khả năng xác định các cụm và giảm thiểu tổn thất dữ liệu. ANAS được vận hành trên các tập dữ liệu chuẩn, và kết quả cho thấy ANAS có thể xác định 80% các cụm của tập dữ liệu gốc trên các tập dữ liệu thưa, trong khi các kỹ thuật hiện có không xác định được cụm nào. ANAS giảm tổn thất dữ liệu xuống 50%, mất thông tin xuống 20% và nâng cao quyền riêng tư lên 40%.

Từ khóa

#quyền riêng tư dữ liệu #khai thác dữ liệu #tiếng ồn ẩn danh #không gian con #tổn thất dữ liệu

Tài liệu tham khảo

Taipale, Kim A (2003) Data mining and domestic security: Connecting the dots to make sense of data Columbia Science and Technology Law Review. 5(2) Dittrich D, Kenneally E (2011) The Menlo report: ethical principles guiding information and communication technology research. US Department of Homeland Security Sweeney L (2002) k-anonymity: A model for protecting privacy. In Int J Uncertain Fuzziness and Knowledge-based Syst volume 10:557–570 Li T, Venkatasubramanian S (2010) t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. IEEE TKDE 22(7) Gaby G, Iqbal M and Fung B (2015) Fusion: privacy-preserving distributed protocol for high-dimensional data Mashup IEEE 21st international conference on parallel and distributed systems Liew C, Choi C, Liew J (1985) A data distortion by probability distribution ACM trans. Database Syst (TODS) 10(3):395–411 Brand R (2002) Microdata protection through noise addition. Lecture Notes in Computer Science London: Springer Matthias T, Alexander K, Bernhard M (2015) Statistical disclosure control for micro-data using the R package sdcMicro. J Stat Softw 67(4):1–36. https://doi.org/10.18637/jss.v067.i04 Templ M. (2017) Disclosure risk. In: Statistical Disclosure Control for Microdata. Springer, 49–87, Panagopoulos P Pappu V Xanthopoulos P, Pardalos PM (2015) Constrained subspace classifier for high dimensional datasets. Omega https://doi.org/10.1016/j.omega-.2015.05.-009i Beyer K, Goldstein J (1999) When is nearest neighbor meaningful?’ Proc 7th Int Conf database theory. In: Database theory –ICDT’99, vol 1540, pp 217–235 Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD 6(1):90–105 Kriegal HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering and correlation clustering ACM transactions on knowledge discovery from data, 3 Agrawal R, Gehrke J, Gunopulos D, Raghavan R (2005) Automatic subspace clustering of high dimensional data for data mining applications. Data Min Knowl Disc 11(1):5–33 Sweeney, L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int. J Uncertainty Fuzziness Knowledge Based Syst, 10(5):571–588, 2002 Ashwin M, Daniel K, Johannes G, Venkatasubramaniam M (2007) l-diversity: Privacy beyond k-anonymity in ACM Transactions on Knowledge Discovery from Data (TKDD). 1(1):3 Li T, Venkatasubramanian S (2010) t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. IEEE Trans Know Data Eng 22(7) Defays D, Nanopoulos P (1992) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the symposium on design and analysis of longitudinal surveys. Statistics Canada, Ottawa, pp 195–204 Defays DA, MN. (1998) Masking microdata using micro-aggregation. J Off Stat 14(4):449–461 Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201 Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Know Data Eng 17(7):902–911 Lefons E, Silvestri A, Tangorra F (1983) An Analytic Approach to Statistical Databases. Proc. Ninth Int’l Conf. Very Large Data Bases:260–274 Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Rec 29(2):439–450 KimJJ, Winkler WE (2003) Multiplicative noise for masking continuous data, statist. Res. Division, U.S. bureau census, Washington, DC, USA, tech. Rep Liu K, Kargupta H, Ryan J (2006) Random projection- based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Know Data Eng 18 Yi X, Zhang Y (2013) Equally contributory privacy preserving k-means clustering over vertically partitioned data. Inf Syst 38(1):97–107 Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 206–215 Clifton C, Kantarcioglou M, Lin X, Zhu M (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2) Zaman MA, Taniar D, Smith AT (2005) PPDAM: privacy- preserving distributed association rule mining algorithm. IJIIT 1(1):49–69 Fung BW, Wang K, L. and Hung, P. C. K. (2009) Privacy preserving data publishing for cluster analysis. Data Knowl Eng 68:552–575 Kumar P, Varma KI, Sureka A (2011) Fuzzy based clustering algorithm for privacy preserving data mining. Int J Bus Inf Syst 7(1):27–40 Onashoga S, Bamiro B, Akinwale J, Oguntuase J (2017) KC-slice: A dynamic privacy preserving data publishing technique for multi sensitive attributes. Inf Secur J : A Glob Perspect 26(3):121–135 Wang Y, Xiang Y, Singh A (2015) Differentially private subspace clustering. NIPS'15 proceedings of the 28th international conference on neural information processing systems. 1000-1008. Research collection school of information systems Hamm JH (2015) Preserving privacy of continuous high dimensional data with Minimax filters proceedings of the 18th international conference on artificial intelligence and statistics (AISTATS) San Diego, CA, USA JMLR: W&CP volume 38 Xing K, Hu C, Yu J (2017) Mutual privacy preserving K-means clustering in social participatory sensing. IEEE Transactions on Industrial Informatics 13(4):2066–2076 Purohit R, Bhargava D (2017) An illustration to secured way of data mining using privacy preserving data mining. Journal of Statistics and Management Systems 20(4):637–645 Xin Y, Qiang Y, Yang X (2017) The privacy preserving method for dynamic trajectory releasing based on adaptive clustering. Information Sciences 378:131–143 Waluyo AB, Taniar D, Rahayu W and Srinivasan B (2018) A Dual Privacy Preserving Approach for Location-Based Services Mobile Multicast Environment Mobile Netw Appl 23: 34. 2018 https://doi.org/10.1007/s11036-017-0898-6 Liu L, Li L (2018) A clustering 퐾 –anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 2018:1–8. https://doi.org/10.1155/2018/4945152 Zheng XL, Tian G, L and B. Xiao, B. (2018) Privacy preserved community discovery in online social networks. Futur Gener Comput Syst Fanyu B (2018) A High-Order Clustering Algorithm Based on Dropout Deep Learning for Heterogeneous. Data Cyber-Phys-Soc Syst IEEE Access 6:11687–11693 Cao H, Liu S, Wu L, Guan Z, Du X (2018) Achieving differential privacy against non-intrusive load monitoring in smart grid: a fog computing approach. Concurr. Comput. Pract. Exp Talat, R. Obaidat, M. Muzammal, M. A (2020) Decentralised approach to privacy preserving trajectory mining future Gener. Comput Syst, 102 382–392 Fan W, He J, Guo M, Li P, Han Z, Wang R (2010) Privacy preserving classification on local differential privacy in data centers. J Parallel Distrib Comput 135:70–82 Shaham S, Ding M, Liu B, Dang S, Lin Z, Li J Privacy preserving location data publishing: A machine learning approach. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2964658 Agrawal R, Gehrke J, Gunopulos D, Raghavan R (1998) Austomatic subspace clustering of high dimensional data for data mining applications. In: Proc. of 1998 ACM SIGMOD Int. Conf. On Management of Data, pp 94–105 Agrawal R, Gehrke J, Gunopulos D, Raghavan R (2005) Automatic subspace clustering of high dimensional data for data mining applications. Data Min Knowl Disc 11(1):5–33 Josep MM-S, Joseph F (1998) A comparative study of microaggregation methods. Qüestió 22:511–526 Hansen PJ, Mladenovic B, N. (1998) Minimum sum of squares clustering in a low dimensional space. J Classif. 15:37–55 Ward J (1963) Optimal grouping to optimize an optimal Function. J Am Stat Assoc. 58:236–244 Shashidhar V, Venkatesulu D (2019) Subspace-based aggregation for enhancing utility, information measures, and cluster identification in privacy preserved data mining on high-dimensional continuous data. In J Comput Appl Taylor and Francis England DOI:1–10. https://doi.org/10.1080/1206212X.2019.1686211 Shashidhar V, Venkatesulu, D. (2020) Subspace based noise addition for privacy preserved data mining on high dimensional continuous data ambient intelligence and humanized computing, Springer Germany https://doi.org/10.1007/s12652-020-01881-8 R Core Team R (2017) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.Rproject.org/ M. Hassani and M. Hansen (2015) subspace: Interface to OpenSubspace. R package version 1.0.4 https://CRAN.project.org-/package=subspace Mateo-Sanz J, Domingo-Ferrer J, Sebe F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining Knowl Dis 11:181–193 Asuncion, A. and Newman, D. J. (2007) UCI Machine Learning Repository [http://www.ics.uci.edu-/~mlearn/MLRepository.html] Bertino E, Fovino F, Provenza LP (2005) A Framework for Evaluating Privacy Preserving Data Mining Algorithms Data Mining and Knowledge Discovery 11:121–154 Hussaeni K, Fung B, Cheung W (2014) Privacy preserving trajectory stream publishing’. Data Knowl Eng:89–109 Dalenius T (1977) Towards a methodology for statistical disclosure control. Statistisk Tidskrift 5:429–444 Tao Y, Chen H, Xiao X, Zhou S, Zhang D (2009) Angel: enhancing the utility of generalization for privacy preserving publication. IEEE Trans Knowl Data Eng 21(7):1073–1087 Carrizosa E, Gómez A, Morales D (2017) Clustering categories in support vector machines. Omega 66:28–37 Nergiz M, Atzori M, Saygin Y, Guc Y (2009) Towards trajectory anonymization: A generalization-based approach. Trans Data Privacy 2(1):47–75