Sparse p-adic data coding for computationally efficient and effective big data analytics
Tóm tắt
We develop the theory and practical implementation of p-adic sparse coding of data. Rather than the standard, sparsifying criterion that uses the L 0 pseudo-norm, we use the p-adic norm.We require that the hierarchy or tree be node-ranked, as is standard practice in agglomerative and other hierarchical clustering, but not necessarily with decision trees. In order to structure the data, all computational processing operations are direct reading of the data, or are bounded by a constant number of direct readings of the data, implying linear computational time. Through p-adic sparse data coding, efficient storage results, and for bounded p-adic norm stored data, search and retrieval are constant time operations. Examples show the effectiveness of this new approach to content-driven encoding and displaying of data.
Tài liệu tham khảo
citation_journal_title=IEEE Trans. Inf. Theory; citation_title=Model-based compressive sensing; citation_author=R. G. Baraniuk, V. Cevher, M. F. Duarte, C. Hegde; citation_volume=56; citation_issue=4; citation_publication_date=2010; citation_pages=1982-2001; citation_doi=10.1109/TIT.2010.2040894; citation_id=CR1
H.-H. Bock, “Origins and extensions of the k-means algorithm in cluster analysis,” E-Journ. Hist. Prob. Stat. 4 (2) (2008). http://wwwjehpsnet/Decembre2008/Bockpdf.
citation_journal_title=p-Adic Numbers Ultrametric Anal. Appl.; citation_title=On p-adic classification; citation_author=P. E. Bradley; citation_volume=1; citation_issue=4; citation_publication_date=2009; citation_pages=271-285; citation_doi=10.1134/S2070046609040013; citation_id=CR3
L. Brekke and P. G. O. Freund, “p-Adic numbers in physics”, Phys. Rep. 233, 1–66 (1993).
citation_journal_title=Proc. Amer. Phil. Soc.; citation_title=The law of anomalous numbers; citation_author=F. Benford; citation_volume=78; citation_publication_date=1938; citation_pages=551-572; citation_id=CR5
citation_title=An Introduction to Benford’s Law; citation_publication_date=2015; citation_id=CR6; citation_author=A. Berger; citation_author=T. P. Hill
citation_journal_title=J. Class.; citation_title=Fast, linear time hierarchical clustering using the Baire metric; citation_author=P. Contreras, F. Murtagh; citation_volume=29; citation_publication_date=2012; citation_pages=118-143; citation_doi=10.1007/s00357-012-9106-3; citation_id=CR7
citation_journal_title=Comp. J.; citation_title=p-Adic modelling of the genome and the genetic code; citation_author=B. Dragovich, A. Dragovich; citation_volume=53; citation_publication_date=2010; citation_pages=432-42; citation_doi=10.1093/comjnl/bxm083; citation_id=CR8
citation_journal_title=Comp. J.; citation_title=Value-coded trie structure for high-performance IPv6 lookup; citation_author=O. Erdem, A. Carus, H. Le; citation_volume=58; citation_issue=2; citation_publication_date=2015; citation_pages=204-214; citation_doi=10.1093/comjnl/bxt153; citation_id=CR9
citation_title=p-Adic Numbers; citation_publication_date=2003; citation_id=CR10; citation_author=F. Q. Gouvea; citation_publisher=Springer
citation_journal_title=J. Royal Stat. Soc. Ser. B; citation_title=Geometric representation of high dimension, low sample size data; citation_author=P. Hall, J. S. Marron, A. Neeman; citation_volume=67; citation_publication_date=2005; citation_pages=427-444; citation_doi=10.1111/j.1467-9868.2005.00510.x; citation_id=CR11
citation_journal_title=IEICE Trans. Comm.; citation_title=A user’s guide to compressed sensing for communications systems; citation_author=K. Hayashi, M. Nagahara, T. Tanaka; citation_volume=E96-B; citation_issue=3; citation_publication_date=2013; citation_pages=685-712; citation_doi=10.1587/transcom.E96.B.685; citation_id=CR12
citation_journal_title=Stat. Sci.; citation_title=A statistical derivation of the significant-digit law; citation_author=T. P. Hill; citation_volume=10; citation_issue=4; citation_publication_date=1995; citation_pages=354-363; citation_id=CR13
citation_title=The Ternary Manifesto, including “Standard ternary logic,” “Ternary arithmetic,” “Ter- SCII: ternary standard code for information interchange,”; citation_publication_date=2012; citation_id=CR14; citation_author=D. W. Jones
citation_journal_title=Proc. Steklov Inst.Math.; citation_title=Gene expression from polynomial dynamics in the 2-adic information space; citation_author=A. Yu. Khrennikov; citation_volume=265; citation_publication_date=2009; citation_pages=131-139; citation_doi=10.1134/S0081543809020114; citation_id=CR15
citation_journal_title=Comptes-Rendus de l’Acadé mie des Sciences, Tome II; citation_title=Nombres semi-ré els et espaces ultramé triques; citation_author=M. Krasner; citation_volume=219; citation_publication_date=1944; citation_pages=433-435; citation_id=CR16
citation_journal_title=IEEE Trans. Comm.; citation_title=An algorithm for vector quantization design; citation_author=Y. Linde, A. Buzo, R. M. Gray; citation_volume=28; citation_publication_date=1980; citation_pages=84-95; citation_doi=10.1109/TCOM.1980.1094577; citation_id=CR17
citation_title=Linear embedding of binary hierarchies and its applications; citation_inbook_title=Mathematical Hierarchies and Biology, DIMACS Series Disc. Math. Theor. Comp. Sci.; citation_publication_date=1997; citation_pages=331-356; citation_id=CR18; citation_author=B. Mirkin
citation_title=A top-down method for building genome classification trees with linear binary hierarchies; citation_inbook_title=Bioconsensus, DIMACS Ser.; citation_publication_date=2003; citation_pages=97-112; citation_id=CR19; citation_author=B. Mirkin; citation_author=E. Koonin
citation_journal_title=Proc. Steklov Inst. Math.; citation_title=Symmetry in data mining and analysis: a unifying view based on hierarchy; citation_author=F. Murtagh; citation_volume=265; citation_publication_date=2009; citation_pages=177-198; citation_doi=10.1134/S0081543809020175; citation_id=CR20
citation_journal_title=p-Adic Numbers Ultrametric Anal. Appl.; citation_title=From data to the p-adic or ultrametic model; citation_author=F. Murtagh; citation_volume=1; citation_publication_date=2009; citation_pages=58-68; citation_doi=10.1134/S2070046609010063; citation_id=CR21
citation_journal_title=p-Adic Numbers Ultrametric Anal. Appl.; citation_title=Fast, linear time, m-adic hierarchical clustering for search and retrieval using the Baire metric, with linkages to generalized ultrametrics, hashing, Formal Concept Analysis, and precision of data measurement; citation_author=F. Murtagh, P. Contreras; citation_volume=4; citation_publication_date=2012; citation_pages=45-56; citation_id=CR22
citation_title=MoreLikeThis and Scoring in Solr; citation_publication_date=2013; citation_id=CR23; citation_author=F. Murtagh
citation_title=Linear storage and potentially constant time hierarchical clustering using the Baire metric and random spanning paths; citation_inbook_title=Analysis of Large and Complex Data; citation_publication_date=2016; citation_id=CR24; citation_author=F. Murtagh; citation_author=P. Contreras
citation_title=Clustering through high dimensional data scaling: applications and implementations; citation_inbook_title=Proceedings, ECDA 2015, European Conf. on Data Analysis, forthcoming; citation_publication_date=2016; citation_id=CR25; citation_author=F. Murtagh; citation_author=P. Contreras
citation_title=Big Data scaling through metric mapping: Exploiting the remarkable simplicity of very high dimensional spaces using Correspondence Analysis; citation_inbook_title=Proceedings, IFCS 2015, Int. Fed. Class. Soc., forthcoming; citation_publication_date=2016; citation_id=CR26; citation_author=F. Murtagh
citation_title=Random projection towards the Baire metric for high dimensional clustering; citation_inbook_title=Statistical Learning and Data Sciences, Lect. Notes Art. Intell.; citation_publication_date=2015; citation_pages=424-431; citation_id=CR27; citation_author=F. Murtagh; citation_author=P. Contreras
citation_title=Sparse coding; citation_publication_date=2013; citation_id=CR28; citation_author=A. Ng
citation_journal_title=Cont. Math.; citation_title=p-Adic arithmetic coding; citation_author=A. Rodionov, S. Volkov; citation_volume=508; citation_publication_date=2010; citation_pages=201-213; citation_doi=10.1090/conm/508/10001; citation_id=CR29
citation_title=p-Adic arithmetic coding; citation_publication_date=2007; citation_id=CR30; citation_author=A. Rodionov; citation_author=S. Volkov
citation_title=Ultrametric Calculus; citation_publication_date=1984; citation_id=CR31; citation_author=W. H. Schikhof; citation_publisher=Cambridge Univ. Press
citation_title=The Sciences of the Artificial; citation_publication_date=1996; citation_id=CR32; citation_author=H. A. Simon