Generating metadata from web documents: a systematic approach

Hsiang-Yuan Hsueh1, Chun-Nan Chen2, Kun-Fu Huang3
1Computational Intelligence Technology Center, Industrial Technology Research Institute, Taiwan, R.O.C
2Chunghwa Telecom Laboratories, Taiwan, R.O.C
3Information & Communication Research Lab, Industrial Technology Research Institute, Taiwan, R.O.C

Tóm tắt

In this paper, a mechanism generating RDF Semantic Web schema from Web document set as the semantic metadata is proposed. Analyzing both the structural and un-structural content of Web documents, semi-structured Web documents can be conceptualized as resource objects with inter-relationships in RDF diagram. Technically, hyperlinks, basic annotations, and keywords in web documents will be properly analyzed, and corresponding RDF schema will be generated following the mechanism and rules proposed in this paper. It is expected that with the semantic metadata of document sets on the Web being systematically translated instead of manually edited, the semantic operation on the Web, such as semantic query or semantic search, will be possible in the future.

Tài liệu tham khảo

Semantic Web. 2010. http://www.w3.org/standards/semanticWeb Raimbault T: Overviewing the RDF(S) Semantic Web. Proceedings of International Conference on Computational Intelligence and Software Engineering (CiSE 2010). Wuhan, China: IEEE Press; 2010:1–4. Resource Description Framework (RDF). 2004. http://www.w3.org/RDF Open Graph (Facebook Developers). 2012. https://developers.facebook.com/docs/opengraph About Microformats. 2012. http://microformats.org/about HTML5.1 Nightly: A vocabulary and associated APIs for HTML and XHTML. 2011. http://www.w3.org/html/wg/drafts/html/master/Overview.html Extractiv Project. 2011. http://www.extractiv.com/ Mukhopadhyay D, Kumar R, Majumdar S, Sinha S: A New Semantic Web Services to Translate HTML Pages to RDF. Proceedings of 10th International Conference on Information Technology (ICIT 2007). Orissa, India: IEEE Press; 2007:292–294. Brin S, Page L: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 1998, 30(1–7):107–117. Decker S, Mitra P, Melnik S: Framework for the Semantic Web: An RDF Tutorial. IEEE Internet Computing 2000, 4(6):68–73. 10.1109/4236.895018 Agarwal PR: Semantic Web in Comparison to Web 2.0. Proceedings of 3rd International Conference on Intelligent Systems, Modelling and Simulation (ISMS). Kota_Kinabalu, Malaysia: IEEE Press; 2012:558–563. Finin T, Ding L, Pan R, Joshi A, Kolari P, Java A, Peng Y: Swoogle: Searching for knowledge on the Semantic Web. Proceedings of the 20th national conference on Artificial intelligence (AAAI 2005). Pittsburgh, Pennsylvania, USA: AAAI Press; 2005:1682–1683. Web Ontology Language (OWL). 2004. http://www.w3.org/2004/OWL Oren E: Sindice.com: A Document-oriented Lookup Index for Open Linked Data. International Journal of Metadata, Semantics and Ontologies 2008, 3(1):37–52. 10.1504/IJMSO.2008.021204 SPARQL Query Language for RDF. 2008. http://www.w3.org/TR/rdf-sparql-query Jiang H, Ju L, Xu Z: Upgrading the relational database to the Semantic Web with Hibernate. Proceedings of International Conference on Web Information Systems and Mining (WISM 2009). Shanghai, China: IEEE Press; 2009:227–230. Chen Y, Yang X, Yin K, Ho A: Migrating Traditional Database-based Systems onto Semantic Layer, Proceedings of International Conference on Computer Science and Software Engineering (CSSE 2008), 4. Wuhan, Hubei, China: IEEE Press; 2008:672–676. Krishna M: Retaining Semantics in Relational Databases by Mapping them to RDF. Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2006). Hong Kong, China: IEEE Press; 303–306. de Laborda C: Bringing Relational Data into the Semantic Web using SPARQL and Relational OWL. Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDE 2006). Atlanta, GA, USA: IEEE Press; 2006:55. Bizer C: D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs. Proceedings of the 3rd International Semantic Web Conference (ISWC2004). Hiroshima, Japan; 2004. Gu Y, Dan L: Web resources description model based on RDF. Proceedings of 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), pp V9–222-V9–225. 2010. RDFa1.1 Primer: Rich Structured Data Markup for Web Documents. 2008. http://www.w3.org/TR/xhtml-rdfa-primer/ Nakane F, Otsubo M, Hijikata Y, Nishida S: A basic study on attribute name extraction from the Web. Proceedings of IEEE International Conference on Systems, Man and Cybernetics (SMC 2008). Singapore: IEEE Press; 2008:2161–2166. Jin Y, Lin Z, Lin H: The Research of Search Engine Based on Semantic Web. Proceedings of International Symposium on Intelligent Information Technology Application Workshops (IITAW 2008). Shanghai, China: IEEE Press; 2008:360–363. Priebe T, Schlager C, Pernul G: A Search Engine for RDF Metadata. Proceedings of 15th International Workshop on Database and Expert Systems Applications (DEXA 2004). Zaragoza, Spain: IEEE Press; 168–172. XQuery 1.0: An XML Query Language. Second edition. 2011. http://www.w3.org/TR/xquery Rich snippets (microdata, microformats, and RDFa). 2012. http://support.google.com/webmasters/bin/answer.py?hl=en&answer=99170