Russian Web Tables: A Public Corpus of Web Tables for Russian Language Based on Wikipedia
Tóm tắt
Từ khóa
Tài liệu tham khảo
Y. Wang and J. Hu, ‘‘Detecting tables in html documents,’’ in Document Analysis Systems V, Ed. by D. Lopresti, J. Hu, and R. Kashi (Springer, Berlin, 2002), pp. 249–260.
Sh. Zhang and K. Balog, ‘‘Web table extraction, retrieval, and augmentation: A survey,’’ ACM Trans. Intell. Syst. Technol. 11 (2) (2020).
Ch. Bhagavatula, Th. Noraset, and D. Downey, ‘‘Tabel: Entity linking in web tables,’’ in The Semantic Web—ISWC 2015, Ed. by M. Arenas, O. Corcho, E. Simperl, et al. (Springer Int., Cham, 2015), pp. 425–441.
M. J. Cafarella, A. Y. Halevy, Y. Zhang, D. Zhe Wang, and E. Wu, ‘‘Uncovering the relational web,’’ in Proceedings of the 11th International Workshop on the Web and Databases, WebDB 2008, Vancouver, BC, Canada, June 13, 2008 (2008).
J. Eberius, K. Braunschweig, M. Hentsch, M. Thiele, A. Ahmadov, and W. Lehner, ‘‘Building the dresden web table corpus: A classification approach,’’ in Proceedings of the 2015 IEEE/ACM 2nd International Symposium on Big Data Computing BDC (2015), pp. 41–50.
O. Lehmberg, D. Ritze, R. Meusel, and Ch. Bizer, ‘‘A large public corpus of web tables containing time and context metadata,’’ in Proceedings of the 25th International Conference Companion on World Wide Web, WWW’16 Companion (Rep. and Canton of Geneva, CHE, Int. World Wide Web Conf. Steering Committ., 2016), pp. 75–76.
El Kindi Rezig, A. Bhandari, A. Fariha, B. Price, A. Vanterpool, V. Gadepally, and M. Stonebraker, ‘‘DICE: data discovery by example,’’ Proc. VLDB Endow. 14, 2819–2822 (2021).
S. Castelo, R. Rampin, A. S. R. Santos, A. Bessa, F. Chirigati, and J. Freire, ‘‘Auctus: A dataset search engine for data discovery and augmentation,’’ Proc. VLDB Endow. 14, 2791–2794 (2021).
T. Bleifuß, L. Bornemann, D. V. Kalashnikov, F. Naumann, and D. Srivastava, ‘‘Structured object matching across web page revisions,’’ in Proceedings of the 2021 IEEE 37th International Conference on Data Engineering ICDE (2021), pp. 1284–1295.
T. Bleifuß, L. Bornemann, D. V. Kalashnikov, F. Naumann, and D. Srivastava, ‘‘The secret life of wikipedia tables,’’ in Proceedings of the 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores (SEAData), co-located with VLDB (2021).
P. Ouellette, A. Sciortino, F. Nargesian, B. Ghadiri Bashardoost, E. Zhu, K. Pu, and R. J. Miller, ‘‘RONIN: Data lake exploration,’’ Proc. VLDB Endow. 14, 2863–2866 (2021).
X. Deng, H. Sun, A. Lees, Y. Wu, and C. Yu, ‘‘Turl: Table understanding through representation learning,’’ Proc. VLDB Endow. 14, 307–319 (2020).
M. Hulsebos, K. Hu, M. Bakker, E. Zgraggen, A. Satyanarayan, T. Kraska, Ç. Demiralp, and C. Hidalgo, ‘‘Sherlock: A deep learning approach to semantic data type detection,’’ in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’19 (Assoc. Comput. Machin., New York, NY, USA, 2019), pp. 1500–1508.