Data lake governance using IBM-Watson knowledge catalog

Scientific African - Tập 21 - Trang e01854 - 2023
Mohamed Cherradi1, Fadwa Bouhafer1, Anass EL Haddadi1
1Data Science and Competetive Intelligence Team (DSCI), ENSAH, Abdelmalek Essaâdi University (UAE) Tetouan, Morocco

Tài liệu tham khảo

Naeem, 2022, Trends and future perspective challenges in big data, 309 Ikegwu, 2022, Big data analytics for data-driven industry: a review of data sources, tools, challenges, solutions, and research directions, J. Cluster Comput., 25, 10.1007/s10586-022-03568-5 Hameed, 2020, Data preparation: a survey of commercial tools, J. ACM SIGMOD Record., 49, 18, 10.1145/3444831.3444835 Chierici, 2018, Transforming big data into knowledge: the role of knowledge management practice, J. Manag. Decis., 57, 1902, 10.1108/MD-07-2018-0834 Ojokoh, 2020, Big data, analytics and artificial intelligence for sustainability, J. Sci. Afr., 9, e00551 Usai, 2018, Knowledge discovery out of text data: a systematic review via text mining, J. Knowl. Manag., 22, 1471, 10.1108/JKM-11-2017-0517 Cherradi, 2022, Data lake management based on DLDS approach, 679 Cherradi, 2023, DLDB-service: an extensible data lake system, 211 Ehrlinger, 2021, Data catalogs: a systematic literature review and guidelines to implementation, 148 Salvi, 2017, Exploring IBM Watson to extract meaningful information from the list of references of a clinical practice guideline, 193 Aggarwal, 2017, IBM's Watson analytics for health care CollinăSzy, 2017, Implementation of intelligent software using IBM Watson and Bluemix, 58 J. Dixon, Pentaho, Hadoop, and Data Lakes | James Dixon's Blog, (n.d.). https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/ (Accessed 10 April 2023). O'Leary, 2014, Embedding AI and Crowdsourcing in the Big Data Lake, IEEE Intell Syst, 29, 70, 10.1109/MIS.2014.82 Fang, 2015, Managing data lakes in big data era: what's a data lake and why has it became popular in data management ecosystem, J. IEEE Cyber Technol. Autom. Control Intell. Syst., 820 Cherradi, 2022, Data lakes: a survey paper, 823 Couto, 2019, A mapping study about data lakes: an improved definition and possible architectures, 458, 10.18293/SEKE2019-129 Sawadogo, 2021, On data lake architectures and metadata management, J. Intell. Inf. Syst., 56, 97, 10.1007/s10844-020-00608-7 Nargesian, 2019, Data lake management: challenges and opportunities, J. VLDB Endowment., 12, 1986, 10.14778/3352063.3352116 Derakhshannia, 2020, Data lake governance: towards a systemic and natural ecosystem analogy, J. Future Internet, 12, 126, 10.3390/fi12080126 Sawadogo, 2019, Textual data analysis from data lakes, 558 Madera, 2016, The next information architecture evolution: the data lake wave, 174 Munshi, 2018, Data lake lambda architecture for smart grids big data analytics, J. IEEE Access., 6, 40463, 10.1109/ACCESS.2018.2858256 Ulrich, 2022, Understanding the nature of metadata: systematic review, J. Internet Research., 24, e25440, 10.2196/25440 Dibowski, 2020, Using semantic technologies to manage a data lake: data catalog, provenance and access control Wahid, 2018, Mapping the cataloguing practices in information environment: a review of linked data challenges, J. Inf. Learn. Sci., 119, 586 Igor, 2021, MetaCat - metadata catalog for data management systems, J. EPJ Web., 251, 02048, 10.1051/epjconf/202125102048 Sulehri, 2020, Mapping the Metadata challenges in Libraries: a systematic review, J. Digit. Lib. CS Appl. Labadie, 2020, Enhancing the usage of enterprise data with data catalogs, 201 Wilkinson, 2016, The FAIR Guiding Principles for scientific data management and stewardship, J. Sci. Data., 3 Bahim, 2020, The FAIR data maturity model: an approach to harmonise FAIR assessments, J. Data Sci., 19, 141, 10.5334/dsj-2020-041 Kinkade, 2021, Geoscience data publication: practices and perspectives on enabling the FAIR guiding principles, J. Geosci. Data., 9 Meyer, 2021, Systematic review of the status of veterinary epidemiological research in two species regarding the FAIR guiding principles, J. BMC Vet. Res., 17 Barker, 2022, Introducing the FAIR Principles for research software, J. Sci. Data., 9 Katz, 2021, Taking a fresh look at FAIR for research software, J. Patterns CS Models., 2 Margolis, 2014, The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data, J. Am. Medical Inform. Assoc., 21, 957, 10.1136/amiajnl-2014-002974 Riley, 2017 Scholly, 2021 Quix, 2016, GEMMS: a generic and extensible metadata management system for data lakes, 317 Hellerstein, 2017, Ground: a data context service Diamantini, 2018, A new metadata model to uniformly handle heterogeneous data lake sources, 165 Ravat, 2019, Metadata management for data lakes, 37 Sawadogo, 2019, Metadata systems for data lakes: models and features, 440 Eichler, 2020, HANDLE - a generic metadata model for data lakes, 73 Azeroual, 2022, Big research information in data lake, J. Acad. Comput. Sci., 6 Neumaier, 2016, Automated quality assessment of metadata across open data portals, J. Data Inf. Qual., 8 van Helvoirt, 2015, Operationalizing data governance via multi-level metadata management, 160 Bhattacharjee, 2017, IBM deep learning service, J. IBM Res. Dev., 61, 10.1147/JRD.2017.2716578 Cecil, 2019, IBM Watson studio: a platform to transform data to intelligence, 183 Beheshti, 2018, CoreKG: a knowledge lake service, Proc. VLDB Endow, 11, 1942, 10.14778/3229863.3236230 Beheshti, 2017, CoreDB: a data lake service, 2451 Hai, 2016, Constance: an intelligent data lake system, 2097 Lee, 2012 Lněnička, 2015 wu, 2022, Automated metadata annotation: what is and is not possible with machine learning, J. Data Intell., 5, 1