Mô hình phân loại văn bản bảo vệ quyền riêng tư dựa trên nhúng từ và ranh giới quyền riêng tư được xây dựng bằng mạng niềm tin sâu

Bo Ma1, Edmund Lai1, Wei Qi Yan1, Jinsong Wu2
1School of Engineering, Computer & Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand
2Department of Computer Sciences, Universidad de Chile, Santiago, Chile

Tóm tắt

Để khai thác và phân loại thông tin hiệu quả từ các báo cáo hoặc tài liệu và bảo vệ quyền riêng tư của các kết quả được khai thác, chúng tôi đề xuất một thuật toán phân loại quyền riêng tư mang tên Mô hình Hỗ trợ Vector Máy (SVM) Bảo vệ Quyền riêng tư Kết hợp Nhúng từ (WECPPSVM) để phân loại văn bản. Ngoài ra, bài báo này cũng đề xuất Thuật toán Tách Chuỗi Con Thường Xuyên Độc lập và Bảo vệ Quyền riêng tư (PPDIFSEA), có nhiệm vụ tính toán mức độ độc lập của dữ liệu đào tạo đầu vào cho mô hình phân loại theo cách đào tạo Mạng Niềm tin Sâu (DBN) trong PPDIFSEA, từ đó thu được Ranh giới Quyền riêng tư (PB). PB là điều kiện thiết yếu cho cả việc lấy mẫu dữ liệu và tạo ra tiếng ồn quyền riêng tư. Mô hình này có thể bảo vệ quyền riêng tư bằng cách tiêm tiếng ồn quyền riêng tư vào kết quả phân loại, phương pháp này có thể can thiệp vào cuộc tấn công quyền riêng tư dựa trên kiến thức nền. Phân tích định lượng của chúng tôi cho thấy WECPPSVM được đề xuất trong bài báo này có thể tiếp cận gần với các thuật toán phân loại văn bản chính thống về độ chính xác phân loại văn bản trong khi vẫn bảo vệ quyền riêng tư mà không gia tăng độ phức tạp tính toán. Ngoài ra, việc nghiên cứu tích hợp và đánh giá mối đe dọa quyền riêng tư cũng xác nhận rằng phương pháp PPDIFSEA được đề xuất kết hợp với WECPPSVM đạt được mức độ chính xác phân loại và bảo vệ quyền riêng tư chấp nhận được.

Từ khóa


Tài liệu tham khảo

Abdalla M, Abdalla M, Hirst G, Rudzicz F (2020) Exploring the privacy-preserving properties of word embeddings: algorithmic validation study. Journal of medical Internet research 22(7):18055 Abdalla M, Abdalla M, Hirst G, Rudzicz F (2020) Exploring the privacy-preserving properties of word embeddings: algorithmic validation study. Journal of medical Internet research 22(7), 18055 Abe N, Kudo M, Toyama J, Shimbo M (2006) Classifier-independent feature selection on the basis of divergence criterion. Pattern analysis and applications 9(2–3):127–137 Abe N, Kudo M, Toyama J, Shimbo M (2006) Classifier-independent feature selection on the basis of divergence criterion. Pattern analysis and applications 9(2-3), 127–137 Ambrosio L, Miranda M Jr, Maniglia S, Pallara D (2010) Bv functions in abstract wiener spaces. Journal of Functional Analysis 258(3):785–813 Bartunov S, Kondrashkin D, Osokin A, Vetrov D (2016) Breaking sticks and ambiguities with adaptive skip-gram. In: Artificial Intelligence and Statistics, pp. 130–138 Chang YK, Zhao ZH (2011) N’Guérékata, GM (2011) Square-mean almost automorphic mild solutions to some stochastic differential equations in a hilbert space. Advances in Difference Equations 1:1–12 Chowdhury GG (2003) Natural language processing. Annual Review of Information Science and Technology 37(1):51–89 Church KW (2017) Word2vec. Natural Language Engineering 23(1), 155–162 Church KW (2017) Word2vec. Natural Language Engineering 23(1):155–162 Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 Dwork C, Roth A et al (2014) The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9(3–4):211–407 Fellbaum C (2010) Wordnet. Computer Applications, Theory and Applications of Ontology, pp 231–243 Fernandes N, Dras M, McIver A (2019) Generalised differential privacy for text document processing. In: International Conference on Principles of Security and Trust, pp. 123–148 Geng C, Huang Sj, Chen S (2020) Recent advances in open set recognition: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020) Ghosh S, Dasgupta A, Swetapadma A (2019) A study on support vector machine based linear and non-linear pattern classification. In: 2019 International Conference on Intelligent Sustainable Systems (ICISS), pp. 24–28 IEEE Gupta D, Kose U, Le Nguyen B, Bhattacharyya S (2021) Artificial intelligence for data-driven medical diagnosis. Walter de Gruyter GmbH & Co KG Hirsch C, Hosking J, Grundy J (2010) Vikibuilder: end-user specification and generation of visual wikis. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, pp. 13–22 Huang CR, Ahrens K (2003) Individuals, kinds and events: classifier coercion of nouns. Language Sciences 25(4), 353–373 Huang CR, Ahrens K (2003) Individuals, kinds and events: classifier coercion of nouns. Language Sciences 25(4):353–373 Johnson AE, Pollard TJ, Shen L, Lehman LWH, Feng M, Ghassemi M, Moody B, Szolovits P, Anthony Celi L, Mark RG (2016) MIMIC-III: A freely accessible critical care database. Scientific Data 3(1):1–9 Kaggle COVID-19 Open Research Dataset Challenge from www.kaggle.com (2020). https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge Kurnia R, Tangkuman Y, Girsang A (2020) Classification of user comment using word2vec and svm classifier. Int. J. Adv. Trends. Comput. Sci. Eng. 9(1):643–648 Lai S, Liu K, He S, Zhao J (2016) How to generate a good word embedding. IEEE Intelligent Systems 31(6):5–14 Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 609–616 Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD) 1(1):2 Liao X, Yu Y, Li B, Li Z, Qin Z (2019) A new payload partition strategy in color image steganography. IEEE Transactions on Circuits and Systems for Video Technology 30(3):685–696 Liao X, Li K, Zhu X, Liu KR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing 14(5):955–968 Liao X, Li K, Zhu X, Liu KR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing 14(5), 955–968 Liao X, Yins J, Chen M, Qin Z (2020) Adaptive payload distribution in multiple images steganography based on image texture features. IEEE Transactions on Dependable and Secure Computing Liao X, Yu Y, Li B, Li Z, Qin Z (2019) A new payload partition strategy in color image steganography. IEEE Transactions on Circuits and Systems for Video Technology 30(3), 685–696 Li W, Han J, Pei J (2001) Cmar: Accurate and efficient classification based on multiple class-association rules. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 369–376 IEEE Liu D, Jing Y, Zhao J, Wang W, Song G (2017) A fast and efficient algorithm for mining top-k nodes in complex networks. Scientific reports 7(1):1–8 Melis L, Song C, De Cristofaro E, Shmatikov V (2019) Exploiting unintended feature leakage in collaborative learning. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 691–706 Mironov I (2017) Rényi differential privacy. In: 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263–275 IEEE Mitra V, Wang CJ, Banerjee S (2007) Text classification: A least square support vector machine approach. Applied Soft Computing 7(3):908–914 Mnih A, Kavukcuoglu K (2013) Learning word embeddings efficiently with noise-contrastive estimation. Advances in Neural Information Processing Systems(NeurIPS 26:2265–2273 Osswald H (2003) Malliavin calculus in abstract wiener space using infinitesimals. Advances in Mathematics 176(1):1–37 Rahulamathavan Y, Phan RCW, Veluru S, Cumanan K, Rajarajan M (2013) Privacy-preserving multi-class support vector machine for outsourcing the data classification in cloud. IEEE Transactions on Dependable and Secure Computing 11(5):467–479 Rahulamathavan Y, Phan RCW, Veluru S, Cumanan K, Rajarajan M (2013) Privacy-preserving multi-class support vector machine for outsourcing the data classification in cloud. IEEE Transactions on Dependable and Secure Computing 11(5), 467–479 Ramanathan V, Wechsler H (2013) Phishing detection and impersonated entity discovery using conditional random field and latent dirichlet allocation. Computers & Security 34:123–139 Ramanathan V, Wechsler H (2013) Phishing detection and impersonated entity discovery using conditional random field and latent dirichlet allocation. Computers & Security 34, 123–139 Shen D, Wang e. Guoyin (2018) Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, pp. 440–450 Association for Computational Linguistics(ACL) Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Processing Letters 9(3), 293–300 Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Processing Letters 9(3):293–300 Thomas A, Adelani DI, Davody A, Mogadala, A, Klakow D (2020) Investigating the impact of pre-trained word embeddings on memorization in neural networks. In: International Conference on Text, Speech, and Dialogue, pp. 273–281 Springer Trèves F (1966) Linear partial differential equations with constant coefficients: existence, approximation, and regularity of solutions. CRC Press Wang, M, Ning ZH, Xiao C, Li T (2018) Sentiment classification based on information geometry and deep belief networks. IEEE Access 6, 35206–35213 Wang M, Ning ZH, Xiao C, Li T (2018) Sentiment classification based on information geometry and deep belief networks. IEEE Access 6:35206–35213 Wang Q, Xu J, Chen H, He B (2017) Two improved continuous bag-of-word models. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2851–2856 IEEE Yi K, Beheshti J (2009) A hidden markov model-based text classification of medical documents. Journal of Information Science 35(1):67–81 Zhang D, Xu H, Su Z, Xu Y (2015) Chinese comments sentiment classification based on word2vec and svmperf. Expert Systems with Applications 42(4):1857–1863