WoLMIS: a labor market intelligence system for classifying web job vacancies

Roberto Boselli1, Mirko Cesarini1, Stefania Marrara2, Fabio Mercorio1, Mario Mezzanzanica1, Gabriella Pasi2, Marco Viviani2
1CRISP Research Center, University of Milano-Bicocca, Milan, Italy
2Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy

Tóm tắt

Từ khóa


Tài liệu tham khảo

Amato, F., Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M., Moscato, V., Persia, F., & Picariello, A. (2015). Challenge: processing web texts for classifying job offers. In 2015 IEEE international conference on semantic computing (ICSC) (pp. 460–463). https://doi.org/10.1109/ICOSC.2015.7050852 .

Amato, F., Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M., Moscato, V., Persia, F., & Picariello, A. (2015). Classification of job advertisements: a case study. In 23rd Italian symposium on advanced database systems, SEBD 2015, Gaeta, Italy, June 14-17, 2015 (pp. 144–151). http://dblp.uni-trier.de/rec/bib/conf/sebd/AmatoBCMMMPP15 .

Andrews, S., Gibson, H., Domdouzis, K., & Akhgar, B. (2016). Creating corroborated crisis reports from social media data through formal concept analysis. Journal of Intelligent Information Systems, 47(2), 287–312. https://doi.org/10.1007/s10844-016-0404-9 .

Beblavỳ, M., Fabo, B., & Lenaerts, K. (2016). Skills requirements for the 30 most-frequently advertised occupations in the united states: an analysis based on online vacancy data. Tech. Rep. 132, Centre for European Policy Studies (CEPS). http://ssrn.com/abstract=2749549 .

Bifet, A., & Frank, E. (2010). Sentiment knowledge discovery in twitter streaming data. In International conference on discovery science (pp. 115). Springer.

Boselli, R., Cesarini, M., Mercorio, F., & Mezzanzanica, M. (2014). Planning meets data cleansing. In The 24th international conference on automated planning and scheduling (ICAPS) (pp. 439–443). http://www.aaai.org/ocs/index.php/ICAPS/ICAPS14/paper/view/7898 .

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324 .

Califf, M.E. (1998). Relational learning techniques for natural language information extraction. Ph.D. thesis University of Texas at Austin.

Califf, M.E., & Mooney, R.J. (1999). Relational learning of pattern-match rules for information extraction. In AAAI/IAAI (pp. 328–334).

Carnevale, A.P., Jayasundera, T., & Repnikov, D. (2014). Understanding online job ads data: a technical report. Tech. rep., Georgetown University, McCourt School on Public Policy, Center on Education and the Workforce. https://cew.georgetown.edu/wp-content/uploads/2014/11/OCLM.Tech_.Web_.pdf .

Ceci, M., & Malerba, D. (2007). Classifying web documents in a hierarchy of categories: a comprehensive study. Journal of Intelligent Information Systems, 28(1), 37–78.

Cesarini, M., Mezzanzanica, M., & Fugini, M. (2007). Analysis-sensitive conversion of administrative data into statistical information systems. Journal of Cases on Information Technology, 9(4), 57–81.

Chang, C.C., & Lin, C.J. (2011). Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.

Crowther, P.S., & Cox, R.J. (2005). A method for optimal division of data sets for use in neural networks. In Khosla, R., Howlett, R.J., & Jain, L.C. (Eds.) 9th International conference on knowledge-based intelligent information and engineering systems, KES 2005, Melbourne, Australia, September 14-16, 2005, Proceedings, Part IV (pp. 1–7). Berlin: Springer. https://doi.org/10.1007/11554028_1

Elias, P., & Purcell, K. (2004). Soc (he): a classification of occupations for studying the graduate labour market. Tech. rep., Institute for Employment Research, University of Warwick, Coventry, UK. http://www2.warwick.ac.uk/fac/soc/ier/research/completed/7yrs2/rp6.pdf .

ENRLMM (2016). The european network on regional labour market monitoring. http://www.regionallabourmarketmonitoring.net/ . Visited on 2016-11-11.

Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., & Lin, C.J. (2008). Liblinear: a library for large linear classification. The Journal of Machine Learning Research, 9 (Aug), 1871–1874.

Freitag, D., & Kushmerick, N. (2000). Boosted wrapper induction. In AAAI/IAAI (pp. 577–583).

Haykin, S. (1999). A comprehensive foundation of neural networks. Upper Saddle River: Prentice Hall.

Hong, W., Zheng, S., & Wang, H. (2013). Dynamic user profile-based job recommender system. In 2013 8th international conference on computer science & education (ICCSE) (pp. 1499–1503). IEEE.

Hsu, C.W., Chang, C.C., & Lin Chih-Jen, E. (2003). A practical guide to support vector classification. Tech. rep., Department of Computer Science and Information Engineering, National Taiwan University. https://www.cs.sfu.ca/people/Faculty/teaching/726/spring11/svmguide.pdf .

ISCO (2012). International standard classification of Occupations. Visited on 2016-11-11.

Jain, A.K., Mao, J., & Mohiuddin, K.M. (1996). Artificial neural networks: a tutorial. IEEE Computer, 29(3), 31–44.

Javed, F., McNair, M., Jacob, F., & Zhao, M. (2016). Towards a job title classification system. arXiv: 1606.00917 .

Jindal, N., & Liu, B. (2008). Opinion spam and analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining (pp. 219–230): ACM.

Joachims, T. (1998). Text categorization with support vector machines: learning with many relevant features. In Nédellec, C., & Rouveirol, C. (Eds.) Machine Learning: ECML-98, Lecture Notes in Computer Science, (Vol. 1398 pp. 137–142). Berlin: Springer. https://doi.org/10.1007/BFb0026683 , (Vol. 1398 pp. 137–142).

Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv: 1607.01759 .

Kanan, T., & Fox, E.A. (2016). Automated arabic text classification with p-stemmer, machine learning, and a tailored news article taxonomy. JASIST, 67(11), 2667–2683. https://doi.org/10.1002/asi.23609 .

Kessler, R., Torres-Moreno, J.M., & El-Bèze, M. (2007). E-gen: automatic job offer processing system for human resources. In Mexican international conference on artificial intelligence (pp. 985–995). Springer.

Koperwas, J., Skonieczny, Ł., Kozłowski, M., Andruszkiewicz, P., Rybiński, H., & Struk, W. (2016). Intelligent information processing for building university knowledge base. Journal of Intelligent Information Systems, 48, 141–163.

Kureková, L. M., Beblavỳ, M., & Thum-Thysen, A. (2015). Using online vacancies and web surveys to analyse the labour market: a methodological inquiry. IZA Journal of Labor Economics, 4(1), 1–20. https://doi.org/10.1186/s40172-015-0034-4 .

Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning, ICML (Vol. 1 pp. 282–289).

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 (7553), 436–444.

Lee, I. (2011). Modeling the benefit of e-recruiting process integration. Decision Support Systems, 51(1), 230–239.

Lembo, D., Torlone, R., & Marella, A. (Eds.) (2015). In 23rd Italian symposium on advanced database systems, SEBD 2015, Gaeta, Italy, June 14-17, 2015. Curran Associates, Inc. ISBN: 978-1-5108-1087-7. http://dblp.uni-trier.de/rec/bib/conf/sebd/2015 .

LFS (2016). Labour force survey. http://ec.europa.eu/eurostat/web/microdata/european-union-labour-force-survey Visited on 2016-11-11.

Lippmann, R. (1987). An introduction to computing with neural nets. IEEE Assp Magazine, 4(2), 4–22.

Marrara, S., Pasi, G., Viviani, M., Cesarini, M., Mercorio, F., Mezzanzanica, M., & Pappagallo, M. (2017). A language modelling approach for discovering novel labour market occupations from the web. In Proceedings of the international conference on web intelligence, Leipzig, Germany, August 23–26, 2017 (pp. 1026-1034). http://dblp.uni-trier.de/rec/bib/conf/webi/MarraraPVCMMP17 , http://doi.acm.org/10.1145/3106426.3109035 .

Mezzanzanica, M., Boselli, R., Cesarini, M., & Mercorio, F. (2012). Data quality sensitivity analysis on aggregate indicators. In Helfert, M., Francalanci, C., & Filipe, J. (Eds.) Proceedings of the international conference on data technologies and applications, data 2012 (pp. 97–108). INSTICC. https://doi.org/10.5220/0004040300970108 .

Mezzanzanica, M., Boselli, R., Cesarini, M., & Mercorio, F. (2015). A model-based evaluation of data quality activities in KDD. Information Processing & Management, 51(2), 144–166. https://doi.org/10.1016/j.ipm.2014.07.007 http://www.sciencedirect.com/science/article/pii/S0306457314000673 .

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).

Mooney, R.J., & Bunescu, R. (2005). Mining knowledge from text using information extraction. SIGKDD Explorations Newsletter, 7(1), 3–10. https://doi.org/10.1145/1089815.1089817 .

Müller, K. R., Mika, S., Rätsch, G., Tsuda, K., & Schölkopf, B. (2001). An introduction to Kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–201.

Nahm, U.Y., & Mooney, R.J. (2001). Mining soft-matching rules from textual data. In Proceedings of the 17th international joint conference on artificial intelligence (Vol. 2 pp. 979984). Morgan Kaufmann Publishers Inc.

Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on empirical methods in natural language processing (Vol. 10 pp. 7986). Association for Computational Linguistics.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Perea-Ortega, J.M., Martín-Valdivia, M.T., Lȯpez, L.A.U., & Martínez-Cȧmara, E. (2013). Improving polarity classification of bilingual parallel corpora combining machine learning and semantic orientation approaches. JASIST, 64 (9), 1864–1877. https://doi.org/10.1002/asi.22884 .

Poch, M., Bel, N., Espeja, S., & Navıo, F. (2014). Ranking job offers for candidates: learning hidden knowledge from big data. In Language resources and evaluation conference.

Samuelson, P.A. (1974). Remembrances of frisch. European Economic Review, 5 (1), 7–23.

Sayfullina, L., Malmi, E., Liao, Y., & Jung, A. (2017). Domain adaptation for resume classification using convolutional neural networks. arXiv: 1707.05576 .

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.

Segel, E., & Heer, J. (2010). Narrative visualization: telling stories with data. IEEE Transactions on Visualization and Computer Graphics, 16(6), 1139–1148.

Sheth, A.P, Ngonga, A., Wang, Y., Chang, E., Slezak D., Franczyk, B., Alt, R., Tao, X., & Unland, R. (Eds.) (2017). In Proceedings of the international conference on web intelligence, Leipzig, Germany, August 23-26, 2017. ACM. ISBN:978-1-4503-4951-2.

Singh, A., Rose, C., Visweswariah, K., Chenthamarakshan, V., & Kambhatla, N. (2010). Prospect: a system for screening candidates for recruitment. In Proceedings of the 19th ACM international conference on information and knowledge management (pp. 659–668). ACM.

SOC2000 (2016). http://www.ons.gov.uk/ons/guide-method/classifications/archived-standard-classifications/standard-occupational-classification-2000/index.html . Visited on 2016-11-11.

Sun, Q., Amin, M., Yan, B., Martell, C., Markman, V., Bhasin, A., & Ye, J. (2015). Transfer learning for bilingual content classification. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 2147–2156). ACM.

Tang, D., Qin, B., & Liu, T. (2015). Document modeling with gated recurrent neural network for sentiment classification. In EMNLP (pp. 1422–1432).

Turian, J., Ratinov, L., & Bengio, Y. (2010). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 384–394). Association for Computational Linguistics.

Vilares, D., Alonso, M.A., & Gȯmez-rodríguez, C. (2015). On the usefulness of lexical and syntactic processing in polarity classification of twitter messages. JASIST, 66(9), 1799–1816. https://doi.org/10.1002/asi.23284 .

Viviani, M., & Pasi, G. (2017). Credibility in social media: opinions, news, and health information - a survey. WIREs Data Mining and Knowledge Discovery. https://doi.org/10.1002/widm.1209 .

Xu, H., Gu, C., Zhou, H., & Zhang, J. (2017). arXiv: 1705.06123 .

Yang, Y., & Pedersen, J.O. (1997). A comparative study on feature selection in text categorization. In ICML, (Vol. 97 pp. 412–420).

Yi, X., Allan, J., & Croft, W.B. (2007). Matching resumes and jobs based on relevance models. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 809–810). ACM.

Zhu, C., Zhu, H., Xiong, H., Ding, P., & Xie, F. (2016). Recruitment market trend analysis with sequential latent variable models. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. (pp. 383–392). New York: ACM. https://doi.org/10.1145/2939672.2939689

Zubiaga, A., Spina, D., Martínez-unanue, R., & Fresno, V. (2015). Real-time classification of twitter trends. JASIST, 66(3), 462–473. https://doi.org/10.1002/asi.23186 .