Privacy-Preserving Data Sharing by Integrating Perturbed Distance Matrices

SN Computer Science - Tập 1 - Trang 1-10 - 2020

Hanten Chang¹, Hiroyasu Ando²

¹Division of Policy and Planning Sciences, Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Japan

²Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Japan

Tóm tắt

Collecting large amounts of data is beneficial in machine learning to generate models that are less biased. There are many cases in which pieces of similar data are distributed among organizations, and it is difficult to integrate these data owing to issues involving privacy and cost. Integrating these distributed data without delivering the original data leads to the concept of data collaboration, which combines data held by different organizations in a secure manner. We propose a method in which a distance matrix of the original data obtained using common data among organizations is shared to learn neighbor information of the original data. Specifically, the proposed method robustly integrates distributed data, which is of as good quality as connected raw data, in cases where the amount of data in each organization is small and the data bias is large. In addition, the proposed method is applicable to data contaminated by noise. To demonstrate the effectiveness of the proposed method, we performed a classification task on open biological data divided into several pieces and found that the classification results for divided data were as precise as when all data were available. Finally, we show that the robustness of the method against noise improves the anonymity of the original data as a by-product.

Tài liệu tham khảo

Aggarwal CC, Philip SY. A general survey of privacy-preserving data mining models and algorithms. In: Yin Y, Kaku I, Tang J, Zhu JM, editors. Privacy-preserving data mining. New York: Springer; 2008. p. 11–52. Agrawal R, Srikant R. Privacy-preserving data mining. In: ACM Sigmod Record, vol. 29. New York: ACM; 2000. p. 439–50. Bonawitz K, Eichner H, Grieskamp W, Huba D, Ingerman A, Ivanov V, Kiddon C, Konecný J, Mazzocchi S, McMahan HB, Overveldt TV, Petrou D, Ramage D, Roselander J. Towards federated learning at scale: system design. 2019. arXiv:1902.01046. Cai H, Zheng VW, Chang KC. A comprehensive survey of graph embedding: problems, techniques and applications. 2017. arXiv:1709.07604. Chida K, Morohashi G, Fuji H, Magata F, Fujimura A, Hamada K, Ikarashi D, Yamamoto R. Implementation and evaluation of an efficient secure computation system using ‘R’ for healthcare statistics. J Am Med Inf Assoc. 2014;21(e2):e326–31. Cui P, Wang X, Pei J, Zhu W. A survey on network embedding. 2017. arXiv:1711.08752. Cunningham JP, Ghahramani Z. Linear dimensionality reduction: survey, insights, and generalizations. J Mach Learn Res. 2015;16:2859–900. Du W, Atallah MJ. Secure multi-party computation problems and their applications: a review and open problems. In: Proceedings of the 2001 workshop on New security paradigms. ACM; 2001. p. 13–22. Dua D, Graff C. UCI machine learning repository. 2017. http://archive.ics.uci.edu/ml. Goyal P, Ferrara E. Graph embedding techniques, applications, and performance: a survey. 2017. CoRR arXiv:1705.02801. Grover A, Leskovec J. Node2vec: scalable feature learning for networks. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. New York: ACM; 2016. p. 855–64. https://doi.org/10.1145/2939672.2939754. He X. Locality preserving projections. Ph.D. thesis, Chicago, IL, USA. 2005. AAI3195015. Imakura A, Sakurai T. Data collaboration analysis framework using centralization of individual intermediate representations for distributed data sets. ASCE ASME J Risk Uncertain Eng Syst A Civ Eng. 2020;6(2):04020018. Konečný J, McMahan HB, Yu FX, Richtarik P, Suresh AT, Bacon D. Federated learning: Strategies for improving communication efficiency. In: NIPS workshop on private multi-party machine learning. 2016. arXiv:1610.05492. McMahan HB, Moore E, Ramage D, Hampson S, Arcas BA. Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th international conference on artificial intelligence and statistics (AISTATS). 2017. arXiv:1602.05629. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, editors. Advances in neural information processing systems, vol. 26. Red Hook: Curran Associates Inc; 2013. p. 3111–9. Nikolaenko V, Weinsberg U, Ioannidis S, Joye M, Boneh D, Taft N. Privacy-preserving ridge regression on hundreds of millions of records. In: 2013 IEEE symposium on security and privacy. IEEE; 2013. p. 334–48. Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14. New York: ACM; 2014. p. 701–10. https://doi.org/10.1145/2623330.2623732. Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290:2323–6. Sweeney L. k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst. 2002;10(05):557–70. Wagner I, Eckhoff D. Technical privacy metrics: a systematic survey. ACM Comput Surv CSUR. 2018;51(3):57. Yao ACC. How to generate and exchange secrets. In: 27th annual symposium on foundations of computer science (SFCS 1986). IEEE; 1986. p. 162–7

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA