Evaluate, Reorganize and Share: An Approach to Dynamically Organize Digital Hierarchies

Journal on Data Semantics - Tập 3 - Trang 225-236 - 2014
Rodrigo Dias Arruda Senra1, Claudia Bauzer Medeiros1
1Institute of Computing, University of Campinas (UNICAMP), Campinas, Brazil

Tóm tắt

We are overwhelmed and overloaded with the data deluge brought by the digital age. Hierarchies are pervasive cognitive patterns that allow us to reorganize data and reduce the dimensionality of the information space to manageable levels (e.g., filesystems and navigational menus). In spite of their widespread adoption, such hierarchies can be improved to cope with the present needs of data sharing and reuse. First, we seldom use mechanisms to evaluate how well they partition the information space. Second, we build static and content-driven hierarchies instead of dynamic and context-driven (i.e., task-driven) ones. Third, we use ad hoc and implicit hierarchization criteria, whereas they should be explicit and shareable. This paper discusses the problems related to the construction of hierarchies, and presents a conceptual framework to turn them into reconfigurable and shareable artifacts. Moreover, it explores how dynamically reconfigurable hierarchies can better cope with the multi-faceted nature of content, illustrating these principles through a tool that validates our proposal.

Tài liệu tham khảo

Acm CCS (2010) Acm’s computing classification system (ccs). http://www.acm.org/about/class/1998 Baker L, McCallum A (1998) Distributional clustering of words for text classification. In: ACM SIGIR’98: Proceedings of the 21st annual international conference on research and development in information retrieval. ACM, pp 96–103 Berman F (2008) Got data?: a guide to data preservation in the information age. Commun ACM 51:50–56 Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84 Bloehdorn S, Cimiano P, Hotho A (2005) Learning ontologies to improve text clustering and classification. In: Proceeding of the 29th annual conference of the German classification society (GfKl), Magdeburg, Germany, pp 334–341 Crescenzi V, Mecca G (2004) Automatic information extraction from large websites. J ACM (JACM) 51(5):731–779 Dekel O, Keshet J, Singer Y (2004) Large margin hierarchical classification. J Am Stat Assoc 104(487):1213 Dumais S, Chen H (2000) Hierarchical classification of web content. In: ACM SIGIR’00: proceedings of the 23rd annual Iinternational conference on research and development in information retrieval. ACM, pp 256–263 Fernandes A, Moura AMDC, Porto F (2003) An ontology-based approach for organizing, sharing, and querying knowledge objects on the web. In: DEXA’03: proceedings of the 14th international workshop on database and expert systems applications. IEEE, pp 604–609 Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172 Gates S, Teiken W, Cheng K (2005) Taxonomies by the numbers: building high-performance taxonomies. In: proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 568–577 Hua Y, Jiang H, Zhu Y, Feng D, Tian L (2012) Semantic-aware metadata organization paradigm in next-generation file systems. IEEE Trans Parallel Distrib Syst 23(2):337–344 Irmak U, Kraft R (2010) A scalable machine-learning approach for semi-structured named entity recognition. In: Proceeings of the 19th international conference on World Wide Web. ACM, pp 461–470 Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs, NJ, USA Joachims T (1997) A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Machine learning international workshop, pp 143–151 Kiritchenko S, Matwin S, Nock R, Famili AF (2006) Learning and evaluation in the presence of class hierarchies : application to text categorization. In: Proceedings of the 19th Canadian conference on artificial intelligence Kohonen T, Kaski S, Lagus K, Salojarvi J, Honkela J, Paatero V, Saarela A (2000) Self organization of a massive document collection. IEEE Trans Neural Netw 11(3):574–585 Koller D, Sahami M (1997) Hierarchically classifying documents using very few words. In: ICML’97: proceedings of the 14th international conference on machine learning. Morgan Kaufmann, pp 170–178 Köorner C, Benz D, Hotho A, Strohmaier M (2010) Stop thinking, start tagging: tag semantics emerge from collaborative verbosity. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 521–530 Laender AHF, Ribeiro-Neto BA, da Silva AS, Teixeira JS (2002) A brief survey of web data extraction tools. ACM Sigmod Rec 31(2):84–93 Liu J, Yu S, Le J (2005) Dynamic mining hierarchical topic from web news stream data using divisive-agglomerative clustering method. In: PAKDD’05: proceeding of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, Berlin, pp 826–831 McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: AAAI’98: workshop on learning for text categorization, vol 752, pp 41–48 Michalski RS (1980) Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. J Policy Anal Info Syst 4(3):219–244 Miller GA (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol Rev 63(2):81–97 Mishra N, Motwani R (2004) Introduction: special issue on theoretical advances in data clustering. Mach Learn 56(1–3):5–7 Pant G, Srinivasan P (2005) Learning to crawl : comparing classification schemes. ACM Trans Info Syst 23(4):430–462 Popitsch N, Schandl B (2010) Ad-hoc file sharing using linked data technologies. In: PSD’10: proceedings of the international workshop on personal semantic data Qi X, Davison BD (2009) Web page classification. ACM Comput Surv 41(2):1–31 Řehůřek R., Sojka P (2010) Software framework for topic modelling with large corpora. In: LREC’10: proceedings of the workshop on new challenges for NLP frameworks. ELRA, pp 45– 50 Schütze H, Hull DA, Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: ACM SIGIR’95: proceedings of the 18th annual international conference on research and development in information retrieval. ACM, pp 229–237 Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1–47 Senra RDA, Medeiros CB (2011) Organographs - multi-faceted hierarchical categorization of web documents. In: WEBIST’11: proceedings of the 7th international conference on web information systems and technologies, pp 583–588 Sneath P, Sokal R (1973) Numerical taxonomy. The principles and practice of numerical classification. W. H. Freeman and Company, San Francisco, pp xv + 573. ISBN 0-7167-0697-0 Turmo J, Ageno A, Català N (2006) Adaptive information extraction. ACM Comput Surv (CSUR) 38(2):4 Weigend A, Wiener E, Pedersen J (1999) Exploiting hierarchy in text categorization. Inf Retr 1(3):193–216 Xu J, Dichev C, Esterline A (2009) On the Effectiveness of collaborative tagging systems for describing resources. In: WRI’09: proceedings of the world congress on computer science and information engineering, vol 4. IEEE Computer Society, pp 467–471 Yang Y, Liu X (1999) A re-examination of text categorization methods. In: ACM SIGIR’99: proceedings of the 22nd annual international conference on research and development in, information retrieval, pp 42–49