An Approach to Extracting Topic-guided Views from the Sources of a Data Lake

Information Systems Frontiers - Tập 23 Số 1 - Trang 243-262 - 2021
Claudia Diamantini1, Paolo Lo Giudice2, Domenico Potena1, Emanuele Storti1, Domenico Ursino1
1DII, Polytechnic University of Marche, Ancona, Italy
2DIIES, University Mediterranea of Reggio Calabria, Reggio Calabria, Italy

Tóm tắt

Từ khóa


Tài liệu tham khảo

Abiteboul, S., & Duschka, O. (1998). Complexity of answering queries using materialized views. In Proc. of the International Symposium on Principles of Database Systems (SIGMOD/PODS’98) (pp. 254– 263). Seattle: ACM.

Aversano, L., Intonti, R., Quattrocchi, C., & Tortorella, M. (2010). Building a virtual view of heterogeneous data source views. In Proc. of the International Conference on Software and Data Technologies (ICSOFT’10) (pp. 266–275). Athens: INSTICC Press.

Bachtarzi, C., & Bachtarzi, F. (2015). A model-driven approach for materialized views definition over heterogeneous databases. In Proc. of the International Conference on New Technologies of Information and Communication (NTIC’15) (pp. 1–5). Mila: IEEE.

Bergamaschi, S., Castano, S., Vincini, M., & Beneventano, D. (2001). Semantic integration and query of heterogeneous information sources. Data & Knowledge Engineering, 36(3), 215–249.

Bidoit, N., Colazzo, D., Malla, N., & Sartiani, C. (2018). Evaluating queries and updates on big xml documents. Information Systems Frontiers, 20(1), 63–90.

Bilalli, B., Abelló, A., Aluja-Banet, T., & Wrembel, R. (2016). Towards intelligent data analysis: the metadata challenge. In Proc. of the International Conference on Internet of Things and Big Data (ioTBD’16) (pp. 331–338). Rome, Italy.

Biskup, J., & Embley, D. (2003). Extracting information from heterogeneous information sources using ontologically specified target views. Information Systems, 28(3), 169–212. Elsevier.

Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. Microtone Publishing.

Bouadjenek, M.R., Hacid, H., & Bouzeghoub, M. (2016). Social networks and information retrieval, how are they converging? A survey, a taxonomy and an analysis of social information retrieval approaches and platforms. Information Systems, 56, 1–18.

Bougouin, A., Boudin, F., & Daille, B. (2013). Topicrank: Graph-based topic ranking for keyphrase extraction. In Proc.of the International Joint Conference on Natural Language Processing (IJCNLP’13) (pp. 543–551). Nagoya: Asian Federation of Natural Language Processing.

Brackenbury, W., Liu, R., Mondal, M., Elmore, A., Ur, B., Chard, K., & Franklin, M. (2018). Draining the data swamp: A similarity-based approach. In Proc. of the International Workshop on Human-in-the-loop Data Analytics (HILDA’18) (p. 13). Houston: ACM.

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., & Jatowt, A. (2020). YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 509, 257–289. Elsevier.

Castano, S., & Antonellis, V.D. (1999). Building views over semistructured data sources. In Proc. of the International Conference on Conceptual Modeling (ER’99) (pp. 146–160). Paris: Springer.

Chen, C., Shyu, M.-L., & Chen, S.-C. (2016). Weighted subspace modeling for semantic concept retrieval using gaussian mixture models. Information Systems Frontiers, 18(5), 877–889.

Corbellini, A., Mateos, C., Zunino, A., Godoy, D., & Schiaffino, S. (2017). Persisting big-data: The NoSQL landscape. Information Systems, 63, 1–23. Elsevier.

De Meo, P., Quattrone, G., Terracina, G., & Ursino, D. (2006). Integration of XML Schemas at various “severity” levels. Information Systems, 31(6), 397–434.

Debattista, J., Lange, C., & Auer, S. (2014). Representing dataset quality metadata using multi-dimensional views. In Proc. of the International Conference on Semantic Systems (SEM’14) (pp. 92–99). Leipzig: ACM.

Dessi, A., & Atzori, M. (2016). A machine-learning approach to ranking rdf properties. Future Generation Computer Systems, 54, 366–377.

Dublin Core Metadata Initiative. (2012). DCMI Metadata Terms. Technical report.

Fan, W., Wang, X., & Wu, Y. (2016). Answering pattern queries using views. IEEE Transactions on Knowledge and Data Engineering, 28(2), 326–341. IEEE.

Fang, H. (2015). Managing data lakes in big data era: What’s a data lake and why has it became popular in data management ecosystem. In Proc. of the International Conference on Cyber Technology in Automation (CYBER’15) (pp. 820–824). Shenyang: IEEE.

Farid, M., Roatis, A., Ilyas, I., Hoffmann, H., & Chu, X. (2016). CLAMS: bringing quality to data lakes. In Proc. of the International Conference on Management of Data (SIGMOD/PODS’16) (pp. 2089–2092). San Francisco: ACM.

García-Moya, L., Kudama, S., Aramburu, M., & Berlanga, R. (2013). Storing and analysing voice of the market data in the corporate data warehouse. Information Systems Frontiers, 15(3), 331–349.

Hai, R., Geisler, S., & Quix, C. (2016). Constance: an intelligent data lake system. In Proc. of the International Conference on Management of Data (SIGMOD 2016) (pp. 2097–2100). San Francisco: ACM.

Hai, R., Quix, C., & Zhou, C. (2018). Query rewriting for heterogeneous data lakes. In Proc. of the International Conference on European Conference on Advances in Databases and Information Systems(ADBIS’18) (pp. 35–49). Budapest: Springer.

Halevy, A. (2001). Answering queries using views: A survey. The VLDB Journal, 10(4), 270–294. Springer.

Hamadou, H., & Ghozzi, F. (2018). Querying heterogeneous document stores. In Proc. of the International Conference on Enterprise Information Systems (ICEIS’18) (pp. 58–68). Madeira, Portugal.

Heath, T., & Bizer, C. (2011). Linked data:, Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology, 1(1), 1–136.

Hirschman, A. (1964). The paternity of an index. The American Economic Review, 54(5), 761–762.

Hitzler, P., & Janowicz, K. (2013). Linked data, big data, and the 4th paradigm. Semantic Web, 4(3), 233–235.

Janjua, N., Hussain, F., & Hussain, O. (2013). Semantic information and knowledge integration through argumentative reasoning to support intelligent decision making. Information Systems Frontiers, 15(2), 167–192.

Keith, A., Cyganiak, R., Hausenblas, M., & Zhao, J. (2011). Describing linked datasets with the void vocabulary. Technical report.

Klettke, M., Awolin, H., Storl, U., Muller, D., & Scherzinger, S. (2017). Uncovering the evolution history of data lakes. In Proc. of the International Conference on Big data (IEEE bigdata 2017) (pp. 2462–2471). Boston: IEEE.

Kondrak, G. (2005). N-gram similarity and distance. In String processing and Information Retrieval (pp. 115–126): Springer.

Konstantinou, N., Koehler, M., Abel, E., Civili, C., Neumayr, B., Sallinger, E., Fernandes, A., Gottlob, G., Keane, J., & Libkin, L. (2017). The VADA architecture for cost-effective data wrangling. In Proc. of the International Conference on Management of Data (SIGMOD’17) (pp. 1599–1602). Chicago: ACM.

Lassila, O., Swick, R.R., & et al. (1998). Resource description framework (rdf) model and syntax specification.

Maccioni, A., & Torlone, R. (2018). KAYAK: a framework for just-in-time data preparation in a data lake. In Proc. of the international Conference on Advanced information Systems Engineering (CAiSE’18) (pp. 474–489). Tallinn: Springer.

Madhavan, J., Bernstein, P., & Rahm, E. (2001). Generic schema matching with Cupid. In Proc.of the international conference on very large data bases (VLDB 2001) (pp. 49–58). Morgan Kaufmann: Rome.

McPherson, M., Smith-Lovin, L., & Cook, J. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444. JSTOR.

Mouttham, A., Kuziemsky, C., Langayan, D., Peyton, L., & Pereira, J. (2012). Interoperable support for collaborative, mobile, and accessible health care. Information Systems Frontiers, 14(1), 73–85.

Mouzakitis, S., Papaspyros, D., Petychakis, M., Koussouris, S., Zafeiropoulos, A., Fotopoulou, E., Farid, L., Orlandi, F., Attard, J., & Psarras, J. (2017). Challenges and opportunities in renovating public sector information by enabling linked data and analytics. Information Systems Frontiers, 19(2), 321–336.

Tsvetovat, M., & Kouznetsov, A. (2011). Social Network Analysis for startups: Finding connections on the social web. O’Reilly Media Inc.

Navigli, R., & Ponzetto, S. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250. Elsevier.

Oram, A. (2015). Managing the Data Lake Sebastopol. O’Reilly: USA.

Palopoli, L., Pontieri, L., Terracina, G., & Ursino, D. (2000). Intensional and extensional integration and abstraction of heterogeneous databases. Data & Knowledge Engineering, 35(3), 201–237.

Palopoli, L., Saccà, D., Terracina, G., & Ursino, D. (2003a). Uniform techniques for deriving similarities of objects and subschemes in heterogeneous databases. IEEE Transactions on Knowledge and Data Engineering, 15 (2), 271–294.

Palopoli, L., Terracina, G., & Ursino, D. (2001). A graph-based approach for extracting terminological properties of elements of XML documents. In Proc. of the International Conference on Data Engineering (ICDE 2001) (pp. 330–337). Heidelberg: IEEE Computer Society.

Palopoli, L., Terracina, G., & Ursino, D. (2003b). DIKE: A system supporting the semi-automatic construction of Cooperative Information Systems from heterogeneous databases. Software Practice & Experience, 33(9), 847–884.

Palopoli, L., Terracina, G., & Ursino, D. (2003c). Experiences using DIKE, a system for supporting cooperative information system and data warehouse design. Information Systems, 28(7), 835–865.

Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic keyword extraction from individual documents. Text Mining: Applications and Theory, 1, 1–20. Wiley, New York.

Singh, K., & Singh, V. (2016). Answering graph pattern query using incremental views. In Proc.of the international conference on computing (ICCCA’16) (pp. 54–59). Greater Noida: IEEE.

Spink, A., Wolfram, D., Jansen, M.B.J., & Saracevic, T. (2001). Searching the web: the public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234.

Wang, J., Li, J., & Yu, J. (2011). Answering tree pattern queries using views: a revisit. In Proc.of the international conference on extending database technology (EDBT/ICDT’11) (pp. 153–164). Uppsala: ACM.

Wang, J., & Yu, J. (2012). Revisiting answering tree pattern queries using views. ACM Transactions on Database Systems, 37(3), 18. ACM.

Wu, X., Theodoratos, D., & Wang, W. (2009). Answering XML queries using materialized views revisited. In Proc. of the International Conference on Information and Knowledge Management (CIKM ’09) (pp. 475–484). Hong Kong: ACM.

Yi, J., Maghoul, F., & Pedersen, J. (2008). Deciphering mobile search patterns: a study of yahoo! mobile search queries. In Proceedings of the 17th International Conference on World Wide Web, WWW ’08 (pp. 257–266). New York: ACM.