Automatically identifying the function and intent of posts in underground forums

Andrew Caines1, Sergio Pastrana2, Alice Hutchings2, Paula Buttery1
1Natural Language & Information Processing, Department of Computer Science & Technology, University of Cambridge, Cambridge, UK
2Cambridge Cybercrime Centre, Department of Computer Science & Technology, University of Cambridge, Cambridge, UK

Tóm tắt

Từ khóa


Tài liệu tham khảo

Chen, T., He, T., Benesty, M., Khotilovich, V., & Tang, Y. (2018). xgboost: Extreme Gradient Boosting. R package version 0.6.4.1. https://CRAN.R-project.org/package=xgboost .

Daumé III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of the 45th annual meeting of the association of computational linguistics.

Durrett, G., Kummerfeld, J.K., Berg-Kirkpatrick, T., Portnoff, R., Afroz, S., McCoy, D., Levchenko, K., & Paxson, V. (2017). Identifying products in online cybercrime marketplaces: A dataset for fine-grained domain adaptation. In Proceedings of the 2017 conference on empirical methods in natural language processing.

Fleiss, J. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382.

Garrette, D., Mielens, J., & Baldridge, J. (2013). Real-world semi-supervised learning of POS-taggers for low-resource languages. In Proceedings of the 51st annual meeting of the association for computational linguistics (Volume 1: Long Papers).

Helleputte, T. (2017). LiblineaR: Linear predictive models based on the LIBLINEAR C/C++ Library. R package version 2.10-8.

Hoogeveen, D., Wang, L., Baldwin, T., & Verspoor, K. M. (2018). Web forum retrieval and text analytics: A survey. Foundations and Trends in Information Retrieval, 12, 1–163.

Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Machine learning: ECML-98. Berlin: Springer.

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.

Li, W., & Chen, H. (2014). Identifying top sellers in underground economy using deep learning-based sentiment analysis. In Proceedings of the 2014 joint intelligence and security informatics conference.

Lui, M., & Baldwin, T. (2010). Classifying user forum participants: Separating the gurus from the hacks, and other tales of the Internet. In Proceedings of Australasian language technology association workshop.

Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the 2002 conference on empirical methods in natural language processing.

Pastrana, S., Hutchings, A., Caines, A., & Buttery, P. (2018a). Characterizing eve: Analysing cybercrime actors in a large underground forum. In The 21st international symposium on research in attacks, intrusions and defenses (RAID).

Pastrana, S., Thomas, D., Hutchings, A., & Clayton, R. (2018b). Crimebb: Enabling cybercrime research on underground forums at scale. In Proceedings of the 27th international conference on World Wide Web (WWW’18).

Portnoff, R.S., Afroz, S., Durrett, G., Kummerfeld, J.K., Berg-Kirkpatrick, T., & McCoy, D., et al. (2017). Tools for automated analysis of cybercriminal markets. In Proceedings of the 26th international conference on World Wide Web (WWW’17).

Spärck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.

Turian, J., Ratinov, L.A., & Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics.