Learning with Small Data: Subgraph Counting Queries

Data Science and Engineering - Tập 8 - Trang 292-305 - 2023
Kangfei Zhao1, Zongyan He2, Jeffrey Xu Yu2, Yu Rong3
1Beijing Institute of Technology, Beijing, China
2The Chinese University of Hong Kong, Hong Kong, China
3Tencent AI Lab, Shenzhen, China

Tóm tắt

Deep Learning (DL) has been widely used in many applications, and its success is achieved with large training data. A key issue is how to provide a DL solution when there is no large training data to learn initially. In this paper, we explore a meta-learning approach for a specific problem, subgraph isomorphism counting, which is a fundamental problem in graph analysis to count the number of a given pattern graph, p, in a data graph, g, that matches p. There are various data graphs and pattern graphs. A subgraph isomorphism counting query is specified by a pair, (g, p). This problem is NP-hard and needs large training data to learn by DL in nature. We design a Gaussian Process (GP) model which combines Graph Neural Network with Bayesian nonparametric, and we train the GP by a meta-learning algorithm on a small set of training data. By meta-learning, we can obtain a generalized meta-model to better encode the information of data and pattern graphs and capture the prior of small tasks. With the meta-model learned, we handle a collection of pairs (g, p), as a task, where some pairs may be associated with the ground-truth, and some pairs are the queries to answer. There are two cases. One is there are some with ground-truth (few-shot), and one is there is none with ground-truth (zero-shot). We provide our solutions for both. In particular, for zero-shot, we propose a new data-driven approach to predict the count values. Note that zero-shot learning for our regression tasks is difficult, and there is no hands-on solution in the literature. We conducted extensive experimental studies to confirm that our approach is robust to model degeneration on small training data, and our meta-model can fast adapt to new queries by few-shot and zero-shot learning.

Tài liệu tham khảo

Antoniou A, Edwards H, Storkey AJ (2019) How to train your MAML. In: Proc. of ICLR. OpenReview.net Bose AJ, Jain A, Molino P, Hamilton, WL (2019) Meta-graph: few shot link prediction via meta learning. CoRR. arXiv:abs/1912.09867 Bressan M, Chierichetti F, Kumar R, Leucci S, Panconesi A (2017) Counting graphlets: space vs time. In: Proceedings of the WSDM’17, pp 557–566 Bressan M, Leucci S, Panconesi A (2019) Motivo: fast motif counting via succinct color coding and adaptive sampling. Proc VLDB 12(11):1651–1663 Chauhan J, Nathani D, Kaul, M (2020) Few-shot learning on graphs via super-classes based on graph spectral measures. In: Proceedings of the CLR 2020. OpenReview.net Chen X, Lui JCS (2016) Mining graphlet counts in online social networks. In: Proceedings of the ICDM’16, pp 71–80 Chen Z, Chen L, Villar S, Bruna J (2020) Can graph neural networks count substructures? In: Proceedings of the NeurIPS Cook SA (1971) The complexity of theorem-proving procedures. In: Proceedings of the STOC, pp 151–158 Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372 Dai Z, Yang Z, Yang Y, Carbonell JG, Le QV, Salakhutdinov R (2019) Transformer-xl: attentive language models beyond a fixed-length context. In: Proceedings of the ACL, pp 2978–2988 Debnath AK, Lopez de Compadre RL, Debnath G, Shusterman AJ, Hansch C (1991) Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. J Med Chem 34(2):786–797 Ding B, Das S, Marcus R, Wu W, Chaudhuri S, Narasayya VR (2019) AI meets AI: leveraging query executions to improve index recommendations. In Proceedings of SIGMOD, pp 1241–1258 Dutt A, Wang C, Nazi A, Kandula S, Narasayya VR, Chaudhuri S (2019) Selectivity estimation for range predicates using lightweight models. Proc VLDB 12(9):1044–1057 Fang S, Zhao K, Li G, Yu JX (2023) Community search: a meta-learning approach. In: Proceedings of the ICDE’23. IEEE, pp 2358–2371 Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of ICML, vol 70, pp 1126–1135 Gardner JR, Pleiss G, Weinberger KQ, Bindel D, Wilson AG (2018) Gpytorch: blackbox matrix-matrix gaussian process inference with GPU acceleration. In: Proceedings of the NeurIPS, pp 7587–7597 Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the KDD’16, pp 855–864 Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the NeurIPS’17, pp 1024–1034 Hocevar T, Demsar J (2016) Combinatorial algorithm for counting small induced graphs and orbits. CoRR arXiv:abs/1601.06834 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 Huang K, Zitnik M (2020) Graph meta learning via local subgraphs. In: Proceedings of the NeurIPS Jain S, et al (2017) Impact of memory space optimization technique on fast network motif search algorithm. In: Advances in computer and computational sciences. Springer, pp 559–567 Jha M, Seshadhri C, Pinar A (2015) Path sampling: a fast and provable method for estimating 4-vertex subgraph counts. In: Proceedings of the WWW’15, pp 495–505 Kim Y (2014) Convolutional neural networks for sentence classification. In: ACL. ACL, pp 1746–1751 Kipf A, Kipf T, Radke B, Leis V, Boncz PA, Kemper A (2019) Learned cardinalities: estimating correlated joins with deep learning. In Proceedings of the CIDR’19 Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of the ICLR’17 Liang X, Elmore AJ, Krishnan S (2019) Opportunistic view materialization with deep reinforcement learning. CoRR arXiv:abs/1903.01363 Liu X, Pan H, He M, Song Y, Jiang X, Shang L (2020) Neural subgraph isomorphism counting. In: Proceedings of the KDD ’20, pp 1959–1969 Ma N, Bu J, Yang J, Zhang Z, Yao C, Yu Z, Zhou S, Yan X (2020) Adaptive-step graph meta-learner for few-shot graph classification. In Proceedings of the CIKM 2020. ACM, pp 1055–1064 Mandal D, Medya S, Uzzi B, Aggarwal C (2021) Meta-learning with graph neural networks: methods and applications. CoRR arXiv:abs/2103.00137 Marcus RC, Negi P, Mao H, Zhang C, Alizadeh M, Kraska T, Papaemmanouil O, Tatbul N (2019) Neo: a learned query optimizer. Proc VLDB Endow 12(11):1705–1718 Melckenbeeck I, Audenaert P, Colle D, Pickavet M (2018) Efficiently counting all orbits of graphlets of any order in a graph using autogenerated equations. Bioinformatics 34(8):1372–1380 Munkhdalai T, Yu H (2017) Meta networks. In: Proceedings of the ICML. PMLR, pp 2554–2563 Neumann T, Moerkotte G (2011) Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: Proceedings of the ICDE’11, pp 984–994 Nichol A, Achiam J, Schulman J (2018) On first-order meta-learning algorithms. CoRR arXiv:abs/1803.02999 Ortmann M, Brandes U (2017) Efficient orbit-aware triad and quad census in directed and undirected graphs. Appl Netw Sci 2:13 Paredes P, Ribeiro PMP (2013) Towards a faster network-centric subgraph census. In: Proceedings of the ASONAM ’13, pp 264–271 Park Y, Ko S, Bhowmick SS, Kim K, Hong K, Han W (2020) G-CARE: a framework for performance benchmarking of cardinality estimation techniques for subgraph matching. In: Proceedings of the SIGMOD’20, pp 1099–1114 Patacchiola M, Turner J, Crowley EJ, Storkey A (2020) Bayesian meta-learning for the few-shot setting via deep kernels. In: Proceedings of the NeurIPS Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the KDD’14, pp 701–710 Pinar A, Seshadhri C, Vishal V (2017) ESCAPE: efficiently counting all 5-vertex subgraphs. In: Proceedings of the WWW’17, pp 1431–1440 Przulj N (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23(2):177–183 Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, New York Ribeiro P, Paredes P, Silva MEP, Aparício D, Silva F (2019) A survey on subgraph counting: Concepts, algorithms and applications to network motifs and graphlets. CoRR arXiv:abs/1910.13011 Salakhutdinov R, Hinton GE (2007) Using deep belief nets to learn covariance kernels for gaussian processes. In: Proceedings of NIPS, pp 1249–1256 Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap TP (2016) Meta-learning with memory-augmented neural networks. In: Proceedings of ICML, vol 48. JMLR.org, pp 1842–1850 Schlichtkrull MS, Kipf TN, Bloem P, van den Berg R, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: Proceedings of the ESWC vol 10843, pp 593–607 Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. In: Proceedings of the NIPS, pp 4077–4087 Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958 Stefanoni G, Motik B, Kostylev EV (2018) Estimating the cardinality of conjunctive queries over RDF data using graph summarisation. In: Proceedings of the WWW’18, pp 1043–1052 Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the CVPR, pp 1199–1208 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the NeurIPS, pp 5998–6008 Velickovic P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: Proceeings of the ICLR Wang P, Zhao J, Zhang X, Li Z, Cheng J, Lui JCS, Towsley D, Tao J, Guan X (2018) MOSS-5: a fast method of approximating counts of 5-node graphlets in large graphs. IEEE Trans Knowl Data Eng 30(1):73–86 Wang W, Zheng VW, Yu H, Miao C (2019) A survey of zero-shot learning: settings, methods, and applications. ACM Trans Intell Syst Technol 10(2):13:1-13:37 Wilson AG, Adams RP (2013) Gaussian process kernels for pattern discovery and extrapolation. In: Proceedings of the ICML, vol 28, pp 1067–1075 Wilson AG, Hu Z, Salakhutdinov R, Xing EP (2016) Deep kernel learning. In: Proceedings of the AISTATS, vol 51, pp 370–378 Wilson AG, Izmailov P (2020) Bayesian deep learning and a probabilistic perspective of generalization. In: Proceedings of the NeurIPS Xu K, Hu W, Leskovec J, Jegelka S (2019) How powerful are graph neural networks? In: Proceedings of the ICLR’19 Yang Q, Zhang Y, Dai W, Pan SJ (2020) Transfer learning. Cambridge University Press, Cambridge Yang R, Shi J, Xiao X, Yang Y, Bhowmick SS (2020) Homogeneous network embedding for massive graphs via reweighted personalized pagerank. Proc VLDB 13(5):670–683 Zhang J, Dong Y, Wang Y, Tang J, Ding M (2019) Prone: fast and scalable network representation learning. In: Proceedings of the IJCAI’19, pp 4278–4284 Zhao K, Yu JX, He Z, Rong Y (2023) Learning with small data: subgraph counting queries. In: Proceedings of the DASFAA 2023. Springer, pp 308–319 Zhao K, Yu JX, Li Q, Zhang H, Rong Y (2023) Learned sketch for subgraph counting: a holistic approach. VLDB J 1–26 Zhao K, Yu JX, Zhang H, Li Q, Rong Y (2021) A learned sketch for subgraph counting. In: Proceedings of the SIGMOD’21 Zhou F, Cao C, Zhang K, Trajcevski G, Zhong T, Geng J (2019) Meta-gnn: on few-shot node classification in graph meta-learning. In: Proceedings of the CIKM 2019. ACM, pp 2357–2360