Scalable RDF data compression with MapReduce

Concurrency Computation Practice and Experience - Tập 25 Số 1 - Trang 24-39 - 2013
Jacopo Urbani1,2, Jason Maassen1, Niels Drost1, F.J. Seinstra1, Henri E. Bal1
1Dept. of Computer Science, VU University, Amsterdam, The Netherlands
2Jacopo Urbani, Dept. of Computer Science, VU University, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands.

Tóm tắt

SUMMARYThe Semantic Web contains many billions of statements, which are released using the resource description framework (RDF) data model. To better handle these large amounts of data, high performance RDF applications must apply a compression technique. Unfortunately, because of the large input size, even this compression is challenging. In this paper, we propose a set of distributed MapReduce algorithms to efficiently compress and decompress a large amount of RDF data. Our approach uses a dictionary encoding technique that maintains the structure of the data. We highlight the problems of distributed data compression and describe the solutions that we propose. We have implemented a prototype using the Hadoop framework, and evaluate its performance. We show that our approach is able to efficiently compress a large amount of data and scales linearly on both input size and number of nodes. Copyright © 2012 John Wiley & Sons, Ltd.

Từ khóa


Tài liệu tham khảo

10.1038/scientificamerican0501-34

W3C recommendation: Rdf primer.http://www.w3.org/TR/rdf‐primer/.

Linked Life Data.http://www.linkedlifedata.com.

Uk goverment data website.http://data.gov.uk.

Official statistics of linked data website.http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics.

DeanJ GhemawatS.Mapreduce: simplified data processing on large clusters.InProceedings of the USENIX Symposium on Operating Systems Design & Implementation (OSDI) 2004;137–147.

UrbaniJ KotoulasS OrenE van HarmelenF.Scalable distributed reasoning using mapreduce. InProceedings of the ISWC'09 2009.

Urbani J, 2010, ESWC (1), 213

UrbaniJ MaaseenJ BalH.Massive semantic web data compression with MapReduce.Proceedings of the 1st Mapreduce Workshop at HPDC ’10 2010.

DAS‐3 website.http://www.cs.vu.nl/das3.

DBPedia website.http://dbpedia.org.

Swoogle website.http://swoogle.umbc.edu.

10.1016/j.websem.2005.06.005

LDSR website.http://www.ontotext.com/ldsr/.

Billion triple challenge website.http://challenge.semanticweb.org.

Uniprot website.http://www.uniprot.org.

Abadi D, 2007, Proceedings of the 33rd International Conference on Very Large Data Bases, 411

10.14778/1453856.1453965

Broekstra J, 2003, Spinning the Semantic Web

KiryakovA OgnyanovD ManovD.OWLIM—a pragmatic semantic repository for OWL. InProceedings of the Conference on Web Information Systems Engineering (WISE) Workshops 2005;182–192.

10.1145/1772690.1772819

LeeK SonJH KimG‐W KimM‐H.Web document compaction by compressing URI references in RDF and OWL data InICUIMC 2008;163–168.

MichelBS NikoloudakisK ReiherP ZhangL.URL forwarding and compression in adaptive web caching.INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE 2000;670–678.

10.1109/83.923278

NagumoH LuM WatsonK.Parallel algorithms for the static dictionary compression.Data Compression Conference 1995. DCC ’95. Proceedings 1995;162–171.

10.1016/S0020-0190(01)00239-3

10.1145/506309.506312

Yang H, 2007, Proceedings of the ACM SIGMOD International Conference on Management of Data

10.1145/1376616.1376726

10.14778/1687553.1687609