On the Meaningfulness of “Big Data Quality” (Invited Paper)

Data Science and Engineering - Tập 1 Số 1 - Trang 6-20 - 2016
Donatella Firmani1, Massimo Mecella2, Monica Scannapieco3, Carlo Batini4
1Università di Roma Tor Vergata, Rome, Italy
2Sapienza Università di Roma, Rome, Italy
3Istituto Nazionale di Statistica (ISTAT), Rome, Italy
4Università di Milano Bicocca, Milan, Italy

Tóm tắt

Từ khóa


Tài liệu tham khảo

Batini C, Scannapieco M (2015) Data and information quality. Dimensions, principles and techniques. Springer, New York

Batini C, Palmonari M, Viscusi G (2012) The many faces of information and their impact on information quality. In: Proceedings of the 17th international conference on information quality (IQ 2012)

Bergman MK (2001) The deep web: surfacing hidden value. J Electron Publ 7:1407

Bizer C (2007) Quality-driven information filtering in the context of Web-based information systems, PhD thesis. Freie Universität Berlin, March 2007

Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5(3):1–22

Carroll JJ (2003) Signing rdf graphs. Technical report, HPL-2003-142, HP Labs

Chall JS (1995) Readability revisited. The new Dale-Chall readability formula, vol 118. Brookline Books, Cambridge

Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string metrics for matching names and records. KDD Workshop Data Clean Object Consol 3:73–78

Crosby PB (1979) Quality is free. McGraw-Hill, New York

Dalvi N, Machanavajjhala A, Pang B (2012) An analysis of structured data on the web. Proc VLDB Endow 5(7):680–691

de Ridder H, Endrikhovski S (2002) Image quality is fun: reflections on fidelity, usefulness and naturalness. SID Symp Dig Tech Pap 33:986–989

Dong XL, Saha B, Srivastava D (2013) Less is more: selecting sources wisely for integration. In: Proceedings of the 39th international conference on very large data bases, PVLDB’13. VLDB Endowment, pp 37–48

DuBay WH (2004) The principles of readability. http://www.impact-information.com/impactinfo/readability02.pdf

Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19(1):1–16

Fan W, Geerts F (2012) Foundations of data quality management. Synthesis lectures on data management. Morgan & Claypool, San Rafael

Farr JN, Jenkins JJ, Paterson DG (1951) Simplification of flesch reading ease formula. J Appl Psychol 35(5):333

Fellegi IP, Holt D (1976) A systematic approach to automatic edit and imputation. J Am Stat Assoc 71(353):17–35

Flemming A (2011) Qualitätsmerkmale von Linked Data-veröffentlichenden Datenquellen. Diplomarbeit (Quality Criteria for Linked Data Sources) https://cs.uwaterloo.ca/~ohartig/files/DiplomarbeitAnnikaFlemming.pdf

Flesch R (1948) A new readability yardstick. J Appl Psychol 32(3):221

Fürber C, Hepp M (2011) Swiqa—a semantic web information quality assessment framework. In: Proceedings of the ECIS

Gal A (2015) Big data integration. In: Keynote speech at international conference on open and big data (OBD 2015), August 2015, IEEE CS Press

Gil Y, Artz D (2007) Towards content trust of web resources. Web Semant 5(4):227–239

Gonzales RC, Woods RE (2008) Digital image processing. Prentice Hall, Englewood Cliffs

Gunning R (1952) The technique of clear writing. McGraw Hill International Book, New York

He B, Patel M, Zhang Z, Chang K (2007) Accessing the deep web. Commun ACM 50(5):94–101

Hogan A, Umbrich J, Harth A, Cyganiak R, Polleres A, Decker S (2012) An empirical survey of linked data conformance. J Web Semant 14:14–44

Hua W, Wang Z, Wang H, Zheng K, Zhou X (2015) Short text understanding through lexical-semantic analysis. In: Poster at ICDE 2015

Ipeirotis PG, Gravano L (2002) Distributed search over the hidden web: hierarchical database sampling and selection. In: Proceedings of the 28th international conference on very large data bases. VLDB Endowment, pp 394–405

International Organization for Standardization - ISO. Quality management and quality assurance. Vocabulary. ISO 84021994

Jacobi I, Kagal L, Khandelwal A (2011) Rule-based trust assessment on the semantic web. In: International conference on Rule-based reasoning, programming, and applications series, pp 227–241

Juran JM (1988) Juran on planning for quality. The Free Press, New York

Kincaid JP, Fishburne RP Jr, Rogers RL, Chissom BS (1975) Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, DTIC Document

Kitson HD (1921) The mind of the buyer: a psychology of selling, vol 21549. Macmillan, New York

Klare GR (1974) Assessing readability. Read Res Q 10:62–102

Lei Y, Uren V, Motta E (2007) A framework for evaluating semantic metadata. In: Proceedings of the 4th international conference on knowledge capture, ACM

Li Q, Li Y, Gao J, Zhao B, Fan W, Han J (2014) Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data

Li X, Dong XL, Lyons K, Meng W, Srivastava D (2012) Truth finding on the deep web: is the problem solved? Proc VLDB Endow 6(2):97–108

Manzoor A, Truong HL, Dustdar S (2008) On the evaluation of quality of context. In: Smart sensing and context. Springer

Mendes P, Mühleisen H, Bizer C (2012) Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 joint EDBT/ICDT workshops

NASSCOM (2012) Big data—the next big thing. Technical report, NASSCOM (2012)

Payne RS, McVay S (1971) Songs of humpback whales. Science 173:585–597

Pernici B, Scannapieco M (2003) Data quality in web information systems. J Data Semant 1:48–68

Pipino LL, Lee YW, Wang RY (2002) Data quality assessment. Commun ACM 45(4):211–218

Raghavan S, Garcia-Molina H (2001) Crawling the hidden web. In: Proceedings of the 27th international conference on very large data bases

Rekatsinas T, Dong XL, Srivastava D (2014) Characterizing and selecting fresh data sources. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data

Senter RJ, Smith EA (1967) Automated readability index. Technical report, DTIC Document

Sha K, Shi W (2008) Consistency-driven data quality management of networked sensor systems. J Parallel Distrib Comput 68(9):1207–1221

Stankovic JA (2014) Research directions for the internet of things. IEEE Internet Things J 1:3–9

UNECE. Classification of types of big data. http://www1.unece.org/stat/platform/display/bigdata/Classification+of+Types+of+Big+Data . Accessed Aug 2015

W3C. http://www.w3.org/WAI/ . Accessed Aug 2015

Wang RY, Strong DM (1996) Beyond accuracy: what data quality means to data consumers. J Manag Inf Syst 12(4):5–34

Wayne SR (1983) Quality control circle and company wide quality control. Qual Prog 16(10):14–17

Wu FJ, Kao YF, Tseng YC (2011) From wireless sensor networks towards cyber physical systems. Pervasive Mobile Comput 7(4):397–413

Wu W, Yu C, Doan A, Meng W (2004) An interactive clustering-based approach to integrating source query interfaces on the deep web. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data

Zakaluk BL, Samuels SJ (eds) (1988) Readability: its past, present, and future. International Reading Association, Newark