Why good data analysts need to be critical synthesists. Determining the role of semantics in data analysis

Future Generation Computer Systems - Tập 72 - Trang 11-22 - 2017
Simon Scheider1, Frank O. Ostermann2, Benjamin Adams3
1Human Geography and Spatial Planning, Utrecht, Netherlands
2Faculty of Geo-Information Science and Earth Observation (ITC), Enschede, Netherlands
3Department of Geography, University of Canterbury, Christchurch, New Zealand

Tài liệu tham khảo

Anderson, 2008, The end of theory: The data deluge makes the scientific method obsolete, Wired mag., 16 Hey, 2009 Jacobs, 2009, The pathologies of big data, Commun. ACM, 52, 36, 10.1145/1536616.1536632 Jagadish, 2014, Big data and its technical challenges, Commun. ACM, 57, 86, 10.1145/2611567 Boyd, 2012, Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon, Inf. Commun. Soc., 15, 662, 10.1080/1369118X.2012.678878 Lazer, 2014, The parable of google flu: traps in big data analysis, Science, 343 Ostermann, 2015, Advancing science with VGI: Reproducibility and replicability of recent studies using VGI, Trans. GIS McNutt, 2014, Journals unite for reproducibility, Science, 346, 10.1126/science.aaa1724 González-Beltrán, 2015, From peer-reviewed to peer-reproduced in scholarly publishing: The complementary roles of data models and workflows in bioinformatics, PLoS ONE, 10, e0127612, 10.1371/journal.pone.0127612 Drummond, 2009, Replicability is not reproducibility: Nor is it good science Edwards, 2011, Science friction: Data, metadata, and collaboration, Soc. Stud. Sci., 41, 667, 10.1177/0306312711413314 Lagoze, 2014, Big data, data integrity, and the fracturing of the control zone, Big Data Soc., 1, 1, 10.1177/2053951714558281 Voinov, 2016, Modelling with stakeholders Next generation, Environ. Modell. Softw., 77, 196, 10.1016/j.envsoft.2015.11.016 Crawford, 2012, Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon, Inf. Commun. Soc., 15, 662, 10.1080/1369118X.2012.678878 Janowicz, 2014, Why the data train needs semantic rails, AI Mag., 36 Hausenblas, 2009, Scovo: Using statistics on the web of data, 708 Ristoski, 2016, Semantic web in data mining and knowledge discovery: A comprehensive survey, Web Semant. Sci. Serv. Agents World Wide Web, 36, 1, 10.1016/j.websem.2016.01.001 Stasch, 2014, Meaningful spatial prediction and aggregation, Environ. Modell. Softw., 51, 149, 10.1016/j.envsoft.2013.09.006 Box, 1976, Science and statistics, J. Amer. Statist. Assoc., 71, 791, 10.1080/01621459.1976.10480949 Blei, 2014, Build, compute, critique, repeat: Data analysis with latent variable models, Ann. Rev. Stat. Appl., 1, 203, 10.1146/annurev-statistics-022513-115657 Senn, 2011, You may believe you are a Bayesian but you are probably wrong, Ration. Market. Morals, 2 Scheider, 2015, How to talk to each other via computers: Semantic interoperability as conceptual imitation, 97 K.R. Popper, Logik der Forschung, Vol. 4, JCB Mohr (Paul Siebeck), 1982. Quine, 1951, Two dogmas of empiricism, Phil. Rev., 60, 20, 10.2307/2181906 Bartley, 1968, Theories of demarcation between science and metaphysics, vol. 3, 40 Feyerabend, 1963, 3 Feyerabend, 1993 S. Scheider, M. May, A method for inductive estimation of public transport traffic using spatial network characteristics, in: Proceedings of 10th AGILE International Conference on Geographic Information Science, 2007. Goodman, 1983 Remsen, 2016, The use and limits of scientific names in biological informatics, ZooKeys, 550, 207, 10.3897/zookeys.550.9546 Gupta, 2015, Adventures of categories: Modelling the evolution of categories during scientific investigation, 1 T. Mitchell, The need for biases in learning generalizations, Cbm-tr 5-110, Rutgers University, 1980. Sober, 2002, What is the problem of simplicity?, 13 Gigerenzer, 2009, Homo heuristicus: Why biased minds make better inferences, Top. Cogn. Sci., 1, 107, 10.1111/j.1756-8765.2008.01006.x Gigerenzer, 2015, On the supposed evidence for libertarian paternalism, Rev. Phil. Psychol., 1 Hastie, 2005, The elements of statistical learning: data mining, inference and prediction, Math. Intelligencer, 27, 83, 10.1007/BF02985802 May, 2008, A vector-geometry based spatial knn-algorithm for traffic frequency predictions, 442 May, 2008, Pedestrian flow prediction in extensive road networks using biased observational data, 67 Leisch, 2011, Executable papers for the r community: The r2 platform for reproducible research, Procedia Comput. Sci., 4, 618, 10.1016/j.procs.2011.04.065 Scheider, 2017, Semantic typing of linked geoprocessing workflows, Int. J. Digit. Earth K. Hinsen, Computational science: shifting the focus from tools to models, F1000Research 3 (101). http://dx.doi.org/10.12688/f1000research.3978.2. Kuhn, 2015, Designing a language for spatial computing Scheider, 2012 Madin, 2007, An ontology for describing and synthesizing ecological observation data, Ecol. Inform., 2, 279, 10.1016/j.ecoinf.2007.05.004 Sheth, 2008, Semantic sensor web, IEEE Internet Comput., 12, 78, 10.1109/MIC.2008.87 Henson, 2009, SemSOS: Semantic sensor observation service, 44 Horsburgh, 2009, An integrated system for publishing environmental observations data, Environ. Model. Softw., 24, 879, 10.1016/j.envsoft.2009.01.002 Regueiro, 2017, Semantic mediation of observation datasets through sensor observation services, Future Gener. Comput. Syst., 67, 47, 10.1016/j.future.2016.08.013 Cox, 2016, Ontology for observations and sampling features, with alignments to existing models, Semant. Web, 8, 453, 10.3233/SW-160214 Janowicz, 2010, Semantic enablement for spatial data infrastructures, Trans. GIS, 14, 111, 10.1111/j.1467-9671.2010.01186.x Corcho, 2010, Five challenges for the semantic sensor web, Semant. Web, 1, 121, 10.3233/SW-2010-0005 Wang, 2005 Fürber, 2010, Using semantic web resources for data quality management, 211 Paulheim, 2013, Exploiting linked open data as background knowledge in data mining Scheider, 2016, Knowing whether spatio-temporal analysis procedures are applicable to datasets Kuhn, 2012, Core concepts of spatial information for transdisciplinary research, Int. J. Geogr. Inf. Sci., 26, 2267, 10.1080/13658816.2012.722637 Spinsanti, 2013, Automated geographic context analysis for volunteered information, Appl. Geogr., 43, 36, 10.1016/j.apgeog.2013.05.005 Belhajjame, 2015, Using a suite of ontologies for preserving workflow-centric research objects, Web Semant. Sci. Serv. Agents World Wide Web, 32, 16, 10.1016/j.websem.2015.01.003 Alper, 2013, Small is beautiful: Summarizing scientific workflows using semantic annotations, 318 Mitchell, 1997 Mitchell, 1986, Explanation-based generalization: A unifying view, Mach. Learn., 1, 47, 10.1007/BF00116250 Davis, 2015, Commonsense reasoning and commonsense knowledge in artificial intelligence, Commun. ACM, 58, 92, 10.1145/2701413 Mitchell, 1997, Does machine learning really work?, AI Mag., 18, 11 d’Amato, 2014, Inductive reasoning and machine learning for the semantic web, Semant. Web, 5, 3, 10.3233/SW-130103 Stumme, 2006, Semantic web mining: State of the art and future directions, Web Sem.: Sci. Serv. Agents World Wide Web, 4, 124, 10.1016/j.websem.2006.02.001 Capadisli, 2015, Linked sdmx data, Semant. Web, 6, 105, 10.3233/SW-130123 Nigro, 2007 Mulwad, 2010, Using linked data to interpret tables, 109 Kietz, 2014, “Semantics Inside!” but let’s not tell the data miners: Intelligent support for data mining, 706 Paulheim, 2012, Generating possible interpretations for statistics from linked open data, 560 Rijgersberg, 2013, Ontology of units of measure and related concepts, Semant. Web, 4, 3, 10.3233/SW-2012-0069 Compton, 2012, The SSN ontology of the W3C semantic sensor network incubator group, Web Semant. Sci. Serv. Agents World Wide Web, 17, 25, 10.1016/j.websem.2012.05.003 Probst, 2006, Ontological analysis of observations and measurements, 304 Russ, 2011, Knowledge engineering tools for reasoning with scientific observations and interpretations: a neural connectivity use case, BMC Bioinformatics, 12, 1, 10.1186/1471-2105-12-351 Janowicz, 2012, Geospatial semantics and linked spatiotemporal data–past, present, and future, Semant. Web, 3, 321, 10.3233/SW-2012-0077 Talia, 2013, Workflow systems for science: Concepts and tools, ISRN Softw. Eng., 2013, 1, 10.1155/2013/404525 Ludäscher, 2006, Scientific workflow management and the kepler system, Concurr. Comput.: Pract. Exper., 18, 1039, 10.1002/cpe.994 Berthold, 2009, Knime-the konstanz information miner: version 2.0 and beyond, AcM SIGKDD Explor. Newslett., 11, 26, 10.1145/1656274.1656280 Demšar, 2013, Orange: data mining toolbox in python, J. Mach. Learn. Res., 14, 2349 Oinn, 2004, Taverna: a tool for the composition and enactment of bioinformatics workflows, Bioinformatics, 20, 3045, 10.1093/bioinformatics/bth361 Freire, 2006, Managing rapidly-evolving scientific workflows, 10 Deelman, 2004, Pegasus: Mapping scientific workflows onto the grid, 11 Gil, 2007, Wings for pegasus: Creating large-scale scientific applications using semantic representations of computational workflows, 1767 de Jesus, 2012, WPS orchestration using the taverna workbench: The eScience approach, Comput. Geosci., 47, 75, 10.1016/j.cageo.2011.11.011 Missier, 2013, The W3C PROV family of specifications for modelling provenance metadata, 773 Gahegan, 2014, Re-envisioning data description using Peirce’s pragmatics, 142 Scheider, 2016, Modeling spatiotemporal information generation, Int. J. Geogr. Inf. Sci., 30, 1980 Bishr, 2013, Trust and reputation models for quality assessment of human sensor observations, 53 Deelman, 2011, Wings: Intelligent workflow-based design of computational experiments, IEEE Intell. Syst., 1, 62 Gangemi, 2005, Ontology design patterns for semantic web content, 262