Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Mô hình phân loại văn bản bảo vệ quyền riêng tư dựa trên nhúng từ và ranh giới quyền riêng tư được xây dựng bằng mạng niềm tin sâu
Multimedia Tools and Applications - Trang 1-26 - 2023
Tóm tắt
Để khai thác và phân loại thông tin hiệu quả từ các báo cáo hoặc tài liệu và bảo vệ quyền riêng tư của các kết quả được khai thác, chúng tôi đề xuất một thuật toán phân loại quyền riêng tư mang tên Mô hình Hỗ trợ Vector Máy (SVM) Bảo vệ Quyền riêng tư Kết hợp Nhúng từ (WECPPSVM) để phân loại văn bản. Ngoài ra, bài báo này cũng đề xuất Thuật toán Tách Chuỗi Con Thường Xuyên Độc lập và Bảo vệ Quyền riêng tư (PPDIFSEA), có nhiệm vụ tính toán mức độ độc lập của dữ liệu đào tạo đầu vào cho mô hình phân loại theo cách đào tạo Mạng Niềm tin Sâu (DBN) trong PPDIFSEA, từ đó thu được Ranh giới Quyền riêng tư (PB). PB là điều kiện thiết yếu cho cả việc lấy mẫu dữ liệu và tạo ra tiếng ồn quyền riêng tư. Mô hình này có thể bảo vệ quyền riêng tư bằng cách tiêm tiếng ồn quyền riêng tư vào kết quả phân loại, phương pháp này có thể can thiệp vào cuộc tấn công quyền riêng tư dựa trên kiến thức nền. Phân tích định lượng của chúng tôi cho thấy WECPPSVM được đề xuất trong bài báo này có thể tiếp cận gần với các thuật toán phân loại văn bản chính thống về độ chính xác phân loại văn bản trong khi vẫn bảo vệ quyền riêng tư mà không gia tăng độ phức tạp tính toán. Ngoài ra, việc nghiên cứu tích hợp và đánh giá mối đe dọa quyền riêng tư cũng xác nhận rằng phương pháp PPDIFSEA được đề xuất kết hợp với WECPPSVM đạt được mức độ chính xác phân loại và bảo vệ quyền riêng tư chấp nhận được.
Từ khóa
Tài liệu tham khảo
Abdalla M, Abdalla M, Hirst G, Rudzicz F (2020) Exploring the privacy-preserving properties of word embeddings: algorithmic validation study. Journal of medical Internet research 22(7):18055
Abdalla M, Abdalla M, Hirst G, Rudzicz F (2020) Exploring the privacy-preserving properties of word embeddings: algorithmic validation study. Journal of medical Internet research 22(7), 18055
Abe N, Kudo M, Toyama J, Shimbo M (2006) Classifier-independent feature selection on the basis of divergence criterion. Pattern analysis and applications 9(2–3):127–137
Abe N, Kudo M, Toyama J, Shimbo M (2006) Classifier-independent feature selection on the basis of divergence criterion. Pattern analysis and applications 9(2-3), 127–137
Ambrosio L, Miranda M Jr, Maniglia S, Pallara D (2010) Bv functions in abstract wiener spaces. Journal of Functional Analysis 258(3):785–813
Bartunov S, Kondrashkin D, Osokin A, Vetrov D (2016) Breaking sticks and ambiguities with adaptive skip-gram. In: Artificial Intelligence and Statistics, pp. 130–138
Chang YK, Zhao ZH (2011) N’Guérékata, GM (2011) Square-mean almost automorphic mild solutions to some stochastic differential equations in a hilbert space. Advances in Difference Equations 1:1–12
Chowdhury GG (2003) Natural language processing. Annual Review of Information Science and Technology 37(1):51–89
Church KW (2017) Word2vec. Natural Language Engineering 23(1), 155–162
Church KW (2017) Word2vec. Natural Language Engineering 23(1):155–162
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Dwork C, Roth A et al (2014) The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9(3–4):211–407
Fellbaum C (2010) Wordnet. Computer Applications, Theory and Applications of Ontology, pp 231–243
Fernandes N, Dras M, McIver A (2019) Generalised differential privacy for text document processing. In: International Conference on Principles of Security and Trust, pp. 123–148
Geng C, Huang Sj, Chen S (2020) Recent advances in open set recognition: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Ghosh S, Dasgupta A, Swetapadma A (2019) A study on support vector machine based linear and non-linear pattern classification. In: 2019 International Conference on Intelligent Sustainable Systems (ICISS), pp. 24–28 IEEE
Gupta D, Kose U, Le Nguyen B, Bhattacharyya S (2021) Artificial intelligence for data-driven medical diagnosis. Walter de Gruyter GmbH & Co KG
Hirsch C, Hosking J, Grundy J (2010) Vikibuilder: end-user specification and generation of visual wikis. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, pp. 13–22
Huang CR, Ahrens K (2003) Individuals, kinds and events: classifier coercion of nouns. Language Sciences 25(4), 353–373
Huang CR, Ahrens K (2003) Individuals, kinds and events: classifier coercion of nouns. Language Sciences 25(4):353–373
Johnson AE, Pollard TJ, Shen L, Lehman LWH, Feng M, Ghassemi M, Moody B, Szolovits P, Anthony Celi L, Mark RG (2016) MIMIC-III: A freely accessible critical care database. Scientific Data 3(1):1–9
Kaggle COVID-19 Open Research Dataset Challenge from www.kaggle.com (2020). https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge
Kurnia R, Tangkuman Y, Girsang A (2020) Classification of user comment using word2vec and svm classifier. Int. J. Adv. Trends. Comput. Sci. Eng. 9(1):643–648
Lai S, Liu K, He S, Zhao J (2016) How to generate a good word embedding. IEEE Intelligent Systems 31(6):5–14
Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 609–616
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD) 1(1):2
Liao X, Yu Y, Li B, Li Z, Qin Z (2019) A new payload partition strategy in color image steganography. IEEE Transactions on Circuits and Systems for Video Technology 30(3):685–696
Liao X, Li K, Zhu X, Liu KR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing 14(5):955–968
Liao X, Li K, Zhu X, Liu KR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing 14(5), 955–968
Liao X, Yins J, Chen M, Qin Z (2020) Adaptive payload distribution in multiple images steganography based on image texture features. IEEE Transactions on Dependable and Secure Computing
Liao X, Yu Y, Li B, Li Z, Qin Z (2019) A new payload partition strategy in color image steganography. IEEE Transactions on Circuits and Systems for Video Technology 30(3), 685–696
Li W, Han J, Pei J (2001) Cmar: Accurate and efficient classification based on multiple class-association rules. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 369–376 IEEE
Liu D, Jing Y, Zhao J, Wang W, Song G (2017) A fast and efficient algorithm for mining top-k nodes in complex networks. Scientific reports 7(1):1–8
Melis L, Song C, De Cristofaro E, Shmatikov V (2019) Exploiting unintended feature leakage in collaborative learning. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 691–706
Mironov I (2017) Rényi differential privacy. In: 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263–275 IEEE
Mitra V, Wang CJ, Banerjee S (2007) Text classification: A least square support vector machine approach. Applied Soft Computing 7(3):908–914
Mnih A, Kavukcuoglu K (2013) Learning word embeddings efficiently with noise-contrastive estimation. Advances in Neural Information Processing Systems(NeurIPS 26:2265–2273
Osswald H (2003) Malliavin calculus in abstract wiener space using infinitesimals. Advances in Mathematics 176(1):1–37
Rahulamathavan Y, Phan RCW, Veluru S, Cumanan K, Rajarajan M (2013) Privacy-preserving multi-class support vector machine for outsourcing the data classification in cloud. IEEE Transactions on Dependable and Secure Computing 11(5):467–479
Rahulamathavan Y, Phan RCW, Veluru S, Cumanan K, Rajarajan M (2013) Privacy-preserving multi-class support vector machine for outsourcing the data classification in cloud. IEEE Transactions on Dependable and Secure Computing 11(5), 467–479
Ramanathan V, Wechsler H (2013) Phishing detection and impersonated entity discovery using conditional random field and latent dirichlet allocation. Computers & Security 34:123–139
Ramanathan V, Wechsler H (2013) Phishing detection and impersonated entity discovery using conditional random field and latent dirichlet allocation. Computers & Security 34, 123–139
Shen D, Wang e. Guoyin (2018) Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, pp. 440–450 Association for Computational Linguistics(ACL)
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Processing Letters 9(3), 293–300
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Processing Letters 9(3):293–300
Thomas A, Adelani DI, Davody A, Mogadala, A, Klakow D (2020) Investigating the impact of pre-trained word embeddings on memorization in neural networks. In: International Conference on Text, Speech, and Dialogue, pp. 273–281 Springer
Trèves F (1966) Linear partial differential equations with constant coefficients: existence, approximation, and regularity of solutions. CRC Press
Wang, M, Ning ZH, Xiao C, Li T (2018) Sentiment classification based on information geometry and deep belief networks. IEEE Access 6, 35206–35213
Wang M, Ning ZH, Xiao C, Li T (2018) Sentiment classification based on information geometry and deep belief networks. IEEE Access 6:35206–35213
Wang Q, Xu J, Chen H, He B (2017) Two improved continuous bag-of-word models. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2851–2856 IEEE
Yi K, Beheshti J (2009) A hidden markov model-based text classification of medical documents. Journal of Information Science 35(1):67–81
Zhang D, Xu H, Su Z, Xu Y (2015) Chinese comments sentiment classification based on word2vec and svmperf. Expert Systems with Applications 42(4):1857–1863