Sparse p-adic data coding for computationally efficient and effective big data analytics

Pleiades Publishing Ltd - Tập 8 Số 3 - Trang 236-247 - 2016
Murtagh, F.1,2
1Department of Computing and Mathematics, University of Derby, Derby, UK
2Department of Computing, Goldsmiths, University of London, London, UK

Tóm tắt

We develop the theory and practical implementation of p-adic sparse coding of data. Rather than the standard, sparsifying criterion that uses the L 0 pseudo-norm, we use the p-adic norm.We require that the hierarchy or tree be node-ranked, as is standard practice in agglomerative and other hierarchical clustering, but not necessarily with decision trees. In order to structure the data, all computational processing operations are direct reading of the data, or are bounded by a constant number of direct readings of the data, implying linear computational time. Through p-adic sparse data coding, efficient storage results, and for bounded p-adic norm stored data, search and retrieval are constant time operations. Examples show the effectiveness of this new approach to content-driven encoding and displaying of data.

Tài liệu tham khảo

citation_journal_title=IEEE Trans. Inf. Theory; citation_title=Model-based compressive sensing; citation_author=R. G. Baraniuk, V. Cevher, M. F. Duarte, C. Hegde; citation_volume=56; citation_issue=4; citation_publication_date=2010; citation_pages=1982-2001; citation_doi=10.1109/TIT.2010.2040894; citation_id=CR1 H.-H. Bock, “Origins and extensions of the k-means algorithm in cluster analysis,” E-Journ. Hist. Prob. Stat. 4 (2) (2008). http://wwwjehpsnet/Decembre2008/Bockpdf. citation_journal_title=p-Adic Numbers Ultrametric Anal. Appl.; citation_title=On p-adic classification; citation_author=P. E. Bradley; citation_volume=1; citation_issue=4; citation_publication_date=2009; citation_pages=271-285; citation_doi=10.1134/S2070046609040013; citation_id=CR3 L. Brekke and P. G. O. Freund, “p-Adic numbers in physics”, Phys. Rep. 233, 1–66 (1993). citation_journal_title=Proc. Amer. Phil. Soc.; citation_title=The law of anomalous numbers; citation_author=F. Benford; citation_volume=78; citation_publication_date=1938; citation_pages=551-572; citation_id=CR5 citation_title=An Introduction to Benford’s Law; citation_publication_date=2015; citation_id=CR6; citation_author=A. Berger; citation_author=T. P. Hill citation_journal_title=J. Class.; citation_title=Fast, linear time hierarchical clustering using the Baire metric; citation_author=P. Contreras, F. Murtagh; citation_volume=29; citation_publication_date=2012; citation_pages=118-143; citation_doi=10.1007/s00357-012-9106-3; citation_id=CR7 citation_journal_title=Comp. J.; citation_title=p-Adic modelling of the genome and the genetic code; citation_author=B. Dragovich, A. Dragovich; citation_volume=53; citation_publication_date=2010; citation_pages=432-42; citation_doi=10.1093/comjnl/bxm083; citation_id=CR8 citation_journal_title=Comp. J.; citation_title=Value-coded trie structure for high-performance IPv6 lookup; citation_author=O. Erdem, A. Carus, H. Le; citation_volume=58; citation_issue=2; citation_publication_date=2015; citation_pages=204-214; citation_doi=10.1093/comjnl/bxt153; citation_id=CR9 citation_title=p-Adic Numbers; citation_publication_date=2003; citation_id=CR10; citation_author=F. Q. Gouvea; citation_publisher=Springer citation_journal_title=J. Royal Stat. Soc. Ser. B; citation_title=Geometric representation of high dimension, low sample size data; citation_author=P. Hall, J. S. Marron, A. Neeman; citation_volume=67; citation_publication_date=2005; citation_pages=427-444; citation_doi=10.1111/j.1467-9868.2005.00510.x; citation_id=CR11 citation_journal_title=IEICE Trans. Comm.; citation_title=A user’s guide to compressed sensing for communications systems; citation_author=K. Hayashi, M. Nagahara, T. Tanaka; citation_volume=E96-B; citation_issue=3; citation_publication_date=2013; citation_pages=685-712; citation_doi=10.1587/transcom.E96.B.685; citation_id=CR12 citation_journal_title=Stat. Sci.; citation_title=A statistical derivation of the significant-digit law; citation_author=T. P. Hill; citation_volume=10; citation_issue=4; citation_publication_date=1995; citation_pages=354-363; citation_id=CR13 citation_title=The Ternary Manifesto, including “Standard ternary logic,” “Ternary arithmetic,” “Ter- SCII: ternary standard code for information interchange,”; citation_publication_date=2012; citation_id=CR14; citation_author=D. W. Jones citation_journal_title=Proc. Steklov Inst.Math.; citation_title=Gene expression from polynomial dynamics in the 2-adic information space; citation_author=A. Yu. Khrennikov; citation_volume=265; citation_publication_date=2009; citation_pages=131-139; citation_doi=10.1134/S0081543809020114; citation_id=CR15 citation_journal_title=Comptes-Rendus de l’Acadé mie des Sciences, Tome II; citation_title=Nombres semi-ré els et espaces ultramé triques; citation_author=M. Krasner; citation_volume=219; citation_publication_date=1944; citation_pages=433-435; citation_id=CR16 citation_journal_title=IEEE Trans. Comm.; citation_title=An algorithm for vector quantization design; citation_author=Y. Linde, A. Buzo, R. M. Gray; citation_volume=28; citation_publication_date=1980; citation_pages=84-95; citation_doi=10.1109/TCOM.1980.1094577; citation_id=CR17 citation_title=Linear embedding of binary hierarchies and its applications; citation_inbook_title=Mathematical Hierarchies and Biology, DIMACS Series Disc. Math. Theor. Comp. Sci.; citation_publication_date=1997; citation_pages=331-356; citation_id=CR18; citation_author=B. Mirkin citation_title=A top-down method for building genome classification trees with linear binary hierarchies; citation_inbook_title=Bioconsensus, DIMACS Ser.; citation_publication_date=2003; citation_pages=97-112; citation_id=CR19; citation_author=B. Mirkin; citation_author=E. Koonin citation_journal_title=Proc. Steklov Inst. Math.; citation_title=Symmetry in data mining and analysis: a unifying view based on hierarchy; citation_author=F. Murtagh; citation_volume=265; citation_publication_date=2009; citation_pages=177-198; citation_doi=10.1134/S0081543809020175; citation_id=CR20 citation_journal_title=p-Adic Numbers Ultrametric Anal. Appl.; citation_title=From data to the p-adic or ultrametic model; citation_author=F. Murtagh; citation_volume=1; citation_publication_date=2009; citation_pages=58-68; citation_doi=10.1134/S2070046609010063; citation_id=CR21 citation_journal_title=p-Adic Numbers Ultrametric Anal. Appl.; citation_title=Fast, linear time, m-adic hierarchical clustering for search and retrieval using the Baire metric, with linkages to generalized ultrametrics, hashing, Formal Concept Analysis, and precision of data measurement; citation_author=F. Murtagh, P. Contreras; citation_volume=4; citation_publication_date=2012; citation_pages=45-56; citation_id=CR22 citation_title=MoreLikeThis and Scoring in Solr; citation_publication_date=2013; citation_id=CR23; citation_author=F. Murtagh citation_title=Linear storage and potentially constant time hierarchical clustering using the Baire metric and random spanning paths; citation_inbook_title=Analysis of Large and Complex Data; citation_publication_date=2016; citation_id=CR24; citation_author=F. Murtagh; citation_author=P. Contreras citation_title=Clustering through high dimensional data scaling: applications and implementations; citation_inbook_title=Proceedings, ECDA 2015, European Conf. on Data Analysis, forthcoming; citation_publication_date=2016; citation_id=CR25; citation_author=F. Murtagh; citation_author=P. Contreras citation_title=Big Data scaling through metric mapping: Exploiting the remarkable simplicity of very high dimensional spaces using Correspondence Analysis; citation_inbook_title=Proceedings, IFCS 2015, Int. Fed. Class. Soc., forthcoming; citation_publication_date=2016; citation_id=CR26; citation_author=F. Murtagh citation_title=Random projection towards the Baire metric for high dimensional clustering; citation_inbook_title=Statistical Learning and Data Sciences, Lect. Notes Art. Intell.; citation_publication_date=2015; citation_pages=424-431; citation_id=CR27; citation_author=F. Murtagh; citation_author=P. Contreras citation_title=Sparse coding; citation_publication_date=2013; citation_id=CR28; citation_author=A. Ng citation_journal_title=Cont. Math.; citation_title=p-Adic arithmetic coding; citation_author=A. Rodionov, S. Volkov; citation_volume=508; citation_publication_date=2010; citation_pages=201-213; citation_doi=10.1090/conm/508/10001; citation_id=CR29 citation_title=p-Adic arithmetic coding; citation_publication_date=2007; citation_id=CR30; citation_author=A. Rodionov; citation_author=S. Volkov citation_title=Ultrametric Calculus; citation_publication_date=1984; citation_id=CR31; citation_author=W. H. Schikhof; citation_publisher=Cambridge Univ. Press citation_title=The Sciences of the Artificial; citation_publication_date=1996; citation_id=CR32; citation_author=H. A. Simon