Mastering data privacy: leveraging K-anonymity for robust health data sharing
Tóm tắt
In modern healthcare systems, data sources are highly integrated, and the privacy challenges are becoming a paramount concern. Despite the critical importance of privacy preservation in safeguarding sensitive and private information across various domains, there is a notable deficiency of learning and training material for privacy preservation. In this research, we present a k-anonymity algorithm explicitly for educational purposes. The development of the k-anonymity algorithm is complemented by seven validation tests, that have also been used as a basis for constructing five learning scenarios on privacy preservation. The outcomes of this research provide a practical understanding of a well-known privacy preservation technique and extends the familiarity of k-anonymity and the fundamental concepts of privacy protection to a broader audience.
Từ khóa
Tài liệu tham khảo
Artal, R., Rubenfeld, S.: Ethical issues in research. Best Pract. Res. Clin. Obstet. Gynaecol. 43, 107–114 (2017)
Fields, B.G.: Regulatory, legal, and ethical considerations of telemedicine. Sleep Med. Clin. 15(3), 409–416 (2020)
Kayaalp, M.: Patient privacy in the era of big data. Balkan Med. J. 35(1), 8–17 (2018)
Büschel, I., Mehdi, R., Cammilleri, A., Marzouki, Y., Elger, B.: Protecting human health and security in digital Europe: how to deal with the “privacy paradox" ? Sci. Eng. Ethics 20, 639–658 (2014)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 571–588 (2002)
Slijepčević, D., Henzl, M., Klausner, L.D., Dam, T., Kieseberg, P., Zeppelzauer, M.: k-anonymity in practice: How generalisation and suppression affect machine learning classifiers. Comput. Secur. 111, 102488 (2021)
Ren, W., Ghazinour, K., Lian, X.: \( kt \)-safety: graph release via \( k \)-anonymity and \( t \)-closeness. IEEE Trans. Knowl. Data Eng. (2022)
Wang, T., Xu, L., Zhang, M., Zhang, H., Zhang, G.: A new privacy protection approach based on k-anonymity for location-based cloud services. J. Circuits Syst. Comput. 31(05), 2250083 (2022)
K-Anonymity-Unveiled: K-Anonymity Demystified: Dive into k-Anonymity’s core with code and visuals. Learn how to safeguard privacy while preserving data, github.com. https://github.com/ionianCTF/K-Anonymity-Unveiled. Accessed 12 Aug 2023
Ren, W.,Tong, X.,Du, J.,Wang, N., Li, S., Min, G., Zhao, Z.: Privacy enhancing techniques in the internet of things using data anonymisation. Inf. Syst. Front., pp. 1–12 (2021)
Dimopoulou, S., Symvoulidis, C., Koutsoukos, K., Kiourtis, A., Mavrogiorgou, A., Kyriazis, D.: Mobile anonymization and pseudonymization of structured health data for research. In: 2022 Seventh International Conference On Mobile and Secure Services (MobiSecServ), pp. 1–6, IEEE (2022)
Louassef, B.R., Chikouche, N.: Privacy preservation in healthcare systems. In: 2021 International Conference on Artificial Intelligence for Cyber Security Systems and Privacy (AI-CSP), pp. 1–6, IEEE (2021)
Vovk, O., Piho, G., Ross, P.: Methods and tools for healthcare data anonymization: a literature review. Int. J. Gen. Syst. 52(3), 326–342 (2023)
Jain, P.,Gyanchandani, M., Khare, N.: Improved k-anonymity privacy-preserving algorithm using Madhya Pradesh state election commission big data. In: Integrated Intelligent Computing, Communication and Security, pp. 1–10 (2019)
Šarčević, T., Molnar, D., Mayer, R.: An analysis of different notions of effectiveness in k-anonymity. In: Privacy in Statistical Databases: UNESCO Chair in Data Privacy, International Conference, PSD 2020, Tarragona, Spain, September 23–25, 2020, Proceedings, pp. 121–135, Springer (2020)
Jain, P., Gyanchandani, M., Khare, N.: Enhanced secured map reduce layer for big data privacy and security. J. Big Data 6(1), 1–17 (2019)
Rajendran, K., Jayabalan, M., Rana, M.E.: A study on k-anonymity, l-diversity, and t-closeness techniques. IJCSNS 17(12), 172 (2017)
Abubakar, I.B., Yagnik, T., Mohammed, K.: Robustness of k-anonymization model in compliance with general data protection regulation. In: 2022 5th International Conference on Computing and Big Data (ICCBD), pp. 67–72, IEEE (2022)
Asad, M., Aslam, M., Jilani, S.F., Shaukat, S., Tsukada, M.: Shfl: K-anonymity-based secure hierarchical federated learning framework for smart healthcare systems. Future Internet 14(11), 338 (2022)
Sangaiah, A.K., Javadpour, A., Ja’fari, F., Pinto, P., Chuang, H.-M.: Privacy-aware and ai techniques for healthcare based on k-anonymity model in internet of things. IEEE Trans. Eng. Manag. (2023)
Mahesh, R., Meyyappan, T.: Anonymization technique through record elimination to preserve privacy of published data. In: 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering, pp. 328–332, IEEE (2013)
Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. J. Big Data 5(1), 1–18 (2018)
Arava, K., Lingamgunta, S.: Adaptive k-anonymity approach for privacy preserving in cloud. Arab. J. Sci. Eng. 45(4), 2425–2432 (2020)
De Pascale, D., Cascavilla, G., Tamburri, D.A., Van Den Heuvel, W.-J.: Real-world k-anonymity applications: the kgen approach and its evaluation in fraudulent transactions. Inf. Syst. 115, 102193 (2023)
Sahi, M.A., Abbas, H., Saleem, K., Yang, X., Derhab, A., Orgun, M.A., Iqbal, W., Rashid, I., Yaseen, A.: Privacy preservation in e-healthcare environments: state of the art and future directions. IEEE Access 6, 464–478 (2017)
Kanwal, T., Anjum, A., Khan, A.: Privacy preservation in e-health cloud: taxonomy, privacy requirements, feasibility analysis, and opportunities. Clust. Comput. 24, 293–317 (2021)
Gao, D., Liu, Y., Huang, A., Ju, C., Yu, H., Yang, Q.: Privacy-preserving heterogeneous federated transfer learning. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 2552–2559, IEEE (2019)
Simon, G.E., Shortreed, S.M., Coley, R.Y., Penfold, R.B., Rossom, R.C., Waitzfelder, B.E., Sanchez, K., Lynch, F.L.: Assessing and minimizing re-identification risk in research data derived from health care records. eGEMs, 7(1) (2019)
Github - nsubhaan/heart, github.com. https://github.com/nsubhaan/Heart. Accessed 18 June 2023
Velakanti, G., Jarathi, S., Harshini, M., Ankam, P., Vuppu, S.: Heart disease prediction using deep learning algorithm. In: International Conference on Soft Computing and Signal Processing, pp. 83–96 Springer (2021)
Lin, C.-Y.: A reversible privacy-preserving clustering technique based on k-means algorithm. Appl. Soft Comput. 87, 105995 (2020)
Gowda, V.T., Bagai, R.: Generating t-closed partitions of datasets with multiple sensitive attributes. In: 2023 7th International Conference on Cryptography, Security and Privacy (CSP), pp. 107–111, IEEE (2023)
Bae, Y.S., Park, Y., Lee, S.M., Seo, H.H., Lee, H., Ko, T., Lee, E., Park, S.M., Yoon, H.-J.: Development of blockchain-based health information exchange platform using hl7 fhir standards: usability test. IEEE Access 10, 79264–79271 (2022)
Kiourtis, A., Mavrogiorgou, A., Menychtas, A., Maglogiannis, I., Kyriazis, D.: Structurally mapping healthcare data to hl7 fhir through ontology alignment. J. Med. Syst. 43, 1–13 (2019)
Duda, S.N., Kennedy, N., Conway, D., Cheng, A.C., Nguyen, V., Zayas-Cabán, T., Harris, P.A.: Hl7 fhir-based tools and initiatives to support clinical research: a scoping review. J. Am. Med. Inform. Assoc. 29(9), 1642–1653 (2022)
GitHub - scikit-learn/scikit-learn: scikit-learn: machine learning in Python, github.com. https://github.com/scikit-learn/scikit-learn. Accessed 25 June 2023
GitHub - numpy/numpy: The fundamental package for scientific computing with Python, github.com. https://github.com/numpy/numpy. Accessed 25 June 2023
GitHub - scipy/scipy: SciPy library main repository, github.com. https://github.com/scipy/scipy. Accessed 25 June 2023
GitHub - pandas-dev/pandas: Flexible and powerful data analysis/manipulation library for python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more, github.com. https://github.com/pandas-dev/pandas. Accessed 25 June 2023
GitHub - jupyter/notebook: Jupyter Interactive Notebook, github.com. https://github.com/jupyter/notebook. Accessed 25 June 2023
Machanavajjhala, A., Kifer, D., Gehrke, J.,Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data (TKDD), 1(1), pp. 3–es (2007)
Shah, A., Abbas, H., Iqbal, W., Latif, R.: Enhancing e-healthcare privacy preservation framework through l-diversity. In: 2018 14th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 394–399, IEEE (2018)
Parra-Arnau, J., Rebollo-Monedero, D., Forné, J.: Privacy-enhancing technologies and metrics in personalized information systems. In: Advanced Research in Data Privacy, pp. 423–442, Springer (2014)
Caruccio, L., Desiato, D., Polese, G., Tortora, G., Zannone, N.: A decision-support framework for data anonymization with application to machine learning processes. Inf. Sci. 613, 1–32 (2022)
Zigomitros, A., Casino, F., Solanas, A., Patsakis, C.: A survey on privacy properties for data publishing of relational data. IEEE Access 8, 51071–51099 (2020)
GitHub - ionianCTF/privacy-permission-analysis: privacy: Permission analysis for Android Applications—github.com. https://github.com/ionianCTF/privacy-permission-analysis. Accessed 01 Oct 2023