Big Data Privacy: Challenges to Privacy Principles and Models

Jordi Soria-Comas1
1Department of Computer Engineering and Mathematics, UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Tarragona, Catalonia

Tóm tắt

Abstract This paper explores the challenges raised by big data in privacy-preserving data management. First, we examine the conflicts raised by big data with respect to preexisting concepts of private data management, such as consent, purpose limitation, transparency and individual rights of access, rectification and erasure. Anonymization appears as the best tool to mitigate such conflicts, and it is best implemented by adhering to a privacy model with precise privacy guarantees. For this reason, we evaluate how well the two main privacy models used in anonymization (k-anonymity and $$\varepsilon $$ ε -differential privacy) meet the requirements of big data, namely composability, low computational cost and linkability.

Từ khóa


Tài liệu tham khảo

Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: Eiter T, Libkin L (eds) Database theory—ICDT 2005, vol 3363., Lecture Notes in Computer Science. Springer, Berlin, p 246–258

Barbaro M, Zeller T (2006) A face is exposed for AOL searcher no. 4417749. New York Times, August 14

Brookman J, Hans GS (2013) Why collection matters: surveillance as a de facto privacy harm. In: Big data and privacy: making ends meet. The center for internet and society - Stanford Law School

Chen A (2010) Gcreep: google engineer stalked teens, spied on chats. Gawker, New York

Cormode G, Procopiuc C, Srivastava D, Shen E, Yu T (2012) Differentially private spatial decompositions. In: Proceedings of the 2012 IEEE 28th international conference on data engineering. ICDE’12, Washington, DC, USA. IEEE Computer Society, p 20–31

Danezis G, Domingo-Ferrer J, Hansen M, Hoepman J-H, Le Métayer D, Tirtea R, Schiffner S (2015) Privacy and data protection by design—from policy to engineering. Technical report, ENISA

Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Comm ACM 51(1):107–113

Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212

Duhigg C (2012) How companies learn your secrets. New York Times Magazine, February 16

Dwork C (2006) Differential privacy. In: Bugliesi M, Preneel B, Sassone V, Wegener I (eds) Automata, languages and programming, vol 4052., Lecture notes in computer science. Berlin, Springer, p 1–12

Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Halevi S, Rabin T (eds) Proceedings of the third conference on the theory of cryptography, vol 3876., lecture notes in computer science. Springer, p 265–284

Ganta SR, Kasiviswanathan SP, Smith A (2008) Composition attacks and auxiliary information in data privacy. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’08, New York, NY, USA. ACM, p 265–273

Hansell S (2006) AOL removes search data on vast group of web users. New York Times, August 8

Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, de Wolf P-P (2012) Statistical disclosure control. Wiley, New York

LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, SIGMOD’05, New York, NY, USA. ACM, p 49–60

LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd international conference on data engineering, ICDE’06, Washington, DC, USA. IEEE Computer Society

Li N, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. In: Chirkova R, Dogac A, Özsu MT, Sellis TK (eds) Proceedings of the 23rd IEEE international conference on data engineering (ICDE 2007), p 106–115

Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data, 1(1):3

Machanavajjhala A, Kifer D, Abowd J, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the map. In: Proceedings of the 2008 IEEE 24th international conference on data engineering, ICDE’08, Washington, DC, USA. IEEE Computer Society, p 277–286

McSherry FD (2009) Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data, SIGMOD’09, New York, NY, USA. ACM, p 19–30

Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS’04, New York, NY, USA. ACM, p 223–228

Nissim K, Raskhodnikova S, Smith A (2007) Smooth sensitivity and sampling in private data analysis. In: Proceedings of the thirty-ninth annual ACM symposium on the theory of computing, STOC’07, New York, NY, USA. ACM, p 75–84

Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J UN Econ Comm Eur 18:345–354

Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027

Sánchez D, Domingo-Ferrer J, Martínez S (2014) Improving the utility of differential privacy via univariate microaggregation. In: Domingo-Ferrer J (ed) Privacy in statistical databases, vol 8744, lecture notes in computer science. Springer, New York, pp 130–142

Smith A (2011) Privacy-preserving statistical estimation with optimal convergence rates. In: Proceedings of the forty-third annual ACM symposium on theory of computing, STOC’11, New York, NY, USA. ACM, p 813–822

Solove DJ (2011) Nothing to hide: the false tradeoff between privacy an security. Yale University Press, New Haven

Soria-Comas J, Domingo-Ferrer J (2012) Probabilistic k-anonymity through microaggregation and data swapping. In: Proceedings of the IEEE international conference on fuzzy systems (FUZZ-IEEE 2012), p 1–8

Soria-Comas J, Domingo-Ferrer J, Sánchez D, Martínez S (2014) Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J 23(5):771–794

Xiao Y, Xiong L, Yuan C (2010) Differentially private data release through multidimensional partitioning. In: Proceedings of the 7th VLDB conference on secure data management, SDM’10. Springer, Berlin, p 150–168

Xu J, Zhang Z, Xiao X, Yang Y, Yu G (2012) Differentially private histogram publication. In: Proceedings of the 2012 IEEE 28th international conference on data engineering, ICDE’12, Washington, DC, USA. IEEE Computer Society, p 32–43

Zhang J, Cormode G, Procopiuc CM, Srivastava D, Xiao X (2014) Privbayes: private data release via bayesian networks. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD’14, New York, NY, USA. ACM, p 1423–1434