Scientific data

SCIE-ISI SCOPUS (2014-2023)

  2052-4463

 

  Anh Quốc

Cơ quản chủ quản:  NATURE PORTFOLIO , Nature Publishing Group

Lĩnh vực:
Statistics and ProbabilityComputer Science ApplicationsInformation SystemsEducationLibrary and Information SciencesStatistics, Probability and Uncertainty

Các bài báo tiêu biểu

The FAIR Guiding Principles for scientific data management and stewardship
Tập 3 Số 1
Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan‐Willem Boiten, Luiz Olavo Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim W. Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra González-Beltrán, Alasdair J. G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A.C. ’t Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost N. Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca‐Serra, Marco Roos, René van Schaik, Susanna‐Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan van der Lei, Erik M. van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, Barend Mons
Abstract

There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.

MIMIC-III, a freely accessible critical care database
Tập 3 Số 1
Alistair E. W. Johnson, Tom Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad M. Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, Roger G. Mark
Abstract

MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes
Tập 2 Số 1
Chris Funk, Peter Y. Peterson, M. F. Landsfeld, Diego Pedreros, J. P. Verdin, Shraddhanand Shukla, G. J. Husak, James Rowland, L. Harrison, Andrew Hoell, Joel Michaelsen
Abstract

The Climate Hazards group Infrared Precipitation with Stations (CHIRPS) dataset builds on previous approaches to ‘smart’ interpolation techniques and high resolution, long period of record precipitation estimates based on infrared Cold Cloud Duration (CCD) observations. The algorithm i) is built around a 0.05° climatology that incorporates satellite information to represent sparsely gauged locations, ii) incorporates daily, pentadal, and monthly 1981-present 0.05° CCD-based precipitation estimates, iii) blends station data to produce a preliminary information product with a latency of about 2 days and a final product with an average latency of about 3 weeks, and iv) uses a novel blending procedure incorporating the spatial correlation structure of CCD-estimates to assign interpolation weights. We present the CHIRPS algorithm, global and regional validation results, and show how CHIRPS can be used to quantify the hydrologic impacts of decreasing precipitation and rising air temperatures in the Greater Horn of Africa. Using the Variable Infiltration Capacity model, we show that CHIRPS can support effective hydrologic forecasts and trend analyses in southeastern Ethiopia.

Present and future Köppen-Geiger climate classification maps at 1-km resolution
Tập 5 Số 1
Hylke E. Beck, Niklaus E. Zimmermann, Tim R. McVicar, Noemi Vergopolan, Alexis Berg, Eric F. Wood
Abstract

We present new global maps of the Köppen-Geiger climate classification at an unprecedented 1-km resolution for the present-day (1980–2016) and for projected future conditions (2071–2100) under climate change. The present-day map is derived from an ensemble of four high-resolution, topographically-corrected climatic maps. The future map is derived from an ensemble of 32 climate model projections (scenario RCP8.5), by superimposing the projected climate change anomaly on the baseline high-resolution climatic maps. For both time periods we calculate confidence levels from the ensemble spread, providing valuable indications of the reliability of the classifications. The new maps exhibit a higher classification accuracy and substantially more detail than previous maps, particularly in regions with sharp spatial or elevation gradients. We anticipate the new maps will be useful for numerous applications, including species and vegetation distribution modeling. The new maps including the associated confidence maps are freely available via www.gloh2o.org/koppen.

Climatologies at high resolution for the earth’s land surface areas
Tập 4 Số 1
Dirk Nikolaus Karger, Olaf Conrad, Jürgen Böhner, Tobias Kawohl, Holger Kreft, Rodrigo Wilber Soria-Auza, Niklaus E. Zimmermann, H. Peter Linder, Michael Kessler
Abstract

High-resolution information on climatic conditions is essential to many applications in environmental and ecological sciences. Here we present the CHELSA (Climatologies at high resolution for the earth’s land surface areas) data of downscaled model output temperature and precipitation estimates of the ERA-Interim climatic reanalysis to a high resolution of 30 arc sec. The temperature algorithm is based on statistical downscaling of atmospheric temperatures. The precipitation algorithm incorporates orographic predictors including wind fields, valley exposition, and boundary layer height, with a subsequent bias correction. The resulting data consist of a monthly temperature and precipitation climatology for the years 1979–2013. We compare the data derived from the CHELSA algorithm with other standard gridded products and station data from the Global Historical Climate Network. We compare the performance of the new climatologies in species distribution modelling and show that we can increase the accuracy of species range predictions. We further show that CHELSA climatological data has a similar accuracy as other products for temperature, but that its predictions of precipitation patterns are better.

Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features
Tập 4 Số 1
Spyridon Bakas, Hamed Akbari, Aristeidis Sotiras, Michel Bilello, Martin Rozycki, Justin Kirby, John Freymann, Keyvan Farahani, Christos Davatzikos
Abstract

Gliomas belong to a group of central nervous system tumors, and consist of various sub-regions. Gold standard labeling of these sub-regions in radiographic imaging is essential for both clinical and computational studies, including radiomic and radiogenomic analyses. Towards this end, we release segmentation labels and radiomic features for all pre-operative multimodal magnetic resonance imaging (MRI) (n=243) of the multi-institutional glioma collections of The Cancer Genome Atlas (TCGA), publicly available in The Cancer Imaging Archive (TCIA). Pre-operative scans were identified in both glioblastoma (TCGA-GBM, n=135) and low-grade-glioma (TCGA-LGG, n=108) collections via radiological assessment. The glioma sub-region labels were produced by an automated state-of-the-art method and manually revised by an expert board-certified neuroradiologist. An extensive panel of radiomic features was extracted based on the manually-revised labels. This set of labels and features should enable i) direct utilization of the TCGA/TCIA glioma collections towards repeatable, reproducible and comparative quantitative studies leading to new predictive, prognostic, and diagnostic assessments, as well as ii) performance evaluation of computer-aided segmentation methods, and comparison to our state-of-the-art method.

TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958–2015
Tập 5 Số 1
John T. Abatzoglou, Solomon Z. Dobrowski, Sean A. Parks, Katherine C. Hegewisch
Abstract

We present TerraClimate, a dataset of high-spatial resolution (1/24°, ~4-km) monthly climate and climatic water balance for global terrestrial surfaces from 1958–2015. TerraClimate uses climatically aided interpolation, combining high-spatial resolution climatological normals from the WorldClim dataset, with coarser resolution time varying (i.e., monthly) data from other sources to produce a monthly dataset of precipitation, maximum and minimum temperature, wind speed, vapor pressure, and solar radiation. TerraClimate additionally produces monthly surface water balance datasets using a water balance model that incorporates reference evapotranspiration, precipitation, temperature, and interpolated plant extractable soil water capacity. These data provide important inputs for ecological and hydrological studies at global scales that require high spatial resolution and time varying climate and climatic water balance data. We validated spatiotemporal aspects of TerraClimate using annual temperature, precipitation, and calculated reference evapotranspiration from station data, as well as annual runoff from streamflow gauges. TerraClimate datasets showed noted improvement in overall mean absolute error and increased spatial realism relative to coarser resolution gridded datasets.

The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments
Tập 3 Số 1
Krzysztof J. Gorgolewski, Tibor Auer, Vince D. Calhoun, R. Cameron Craddock, Samir Das, Eugene Duff, Guillaume Flandin, Satrajit Ghosh, Tristan Glatard, Yaroslav Halchenko, Daniel A. Handwerker, Michael Hanke, David B. Keator, Xiangrui Li, Zachary Michael, Camille Maumet, B. Nolan Nichols, Thomas E. Nichols, John Pellman, Jean‐Baptiste Poline, Ariel Rokem, Gunnar Schaefer, Vanessa Sochat, William Triplett, Jessica A. Turner, Gaël Varoquaux, Russell A. Poldrack
Abstract

The development of magnetic resonance imaging (MRI) techniques has defined modern neuroimaging. Since its inception, tens of thousands of studies using techniques such as functional MRI and diffusion weighted imaging have allowed for the non-invasive study of the brain. Despite the fact that MRI is routinely used to obtain data for neuroscience research, there has been no widely adopted standard for organizing and describing the data collected in an imaging experiment. This renders sharing and reusing data (within or between labs) difficult if not impossible and unnecessarily complicates the application of automatic pipelines and quality assurance protocols. To solve this problem, we have developed the Brain Imaging Data Structure (BIDS), a standard for organizing and describing MRI datasets. The BIDS standard uses file formats compatible with existing software, unifies the majority of practices already common in the field, and captures the metadata necessary for most common data processing operations.

China CO2 emission accounts 1997–2015
Tập 5 Số 1
Yuli Shan, Dabo Guan, Heran Zheng, Jiamin Ou, Yuan Li, Jing Meng, Zhifu Mi, Zhu Liu, Qiang Zhang
Abstract

China is the world’s top energy consumer and CO2 emitter, accounting for 30% of global emissions. Compiling an accurate accounting of China’s CO2 emissions is the first step in implementing reduction policies. However, no annual, officially published emissions data exist for China. The current emissions estimated by academic institutes and scholars exhibit great discrepancies. The gap between the different emissions estimates is approximately equal to the total emissions of the Russian Federation (the 4th highest emitter globally) in 2011. In this study, we constructed the time-series of CO2 emission inventories for China and its 30 provinces. We followed the Intergovernmental Panel on Climate Change (IPCC) emissions accounting method with a territorial administrative scope. The inventories include energy-related emissions (17 fossil fuels in 47 sectors) and process-related emissions (cement production). The first version of our dataset presents emission inventories from 1997 to 2015. We will update the dataset annually. The uniformly formatted emission inventories provide data support for further emission-related research as well as emissions reduction policy-making in China.

Charting the complete elastic properties of inorganic crystalline compounds
Tập 2 Số 1
Maarten de Jong, Wei Chen, Thomas Angsten, Anubhav Jain, Randy Notestine, Anthony Gamst, Marcel H. F. Sluiter, Chaitanya Krishna Ande, Sybrand van der Zwaag, José J. Plata, Cormac Toher, Stefano Curtarolo, Gerbrand Ceder, Kristin A. Persson, Mark Asta
Abstract

The elastic constant tensor of an inorganic compound provides a complete description of the response of the material to external stresses in the elastic limit. It thus provides fundamental insight into the nature of the bonding in the material, and it is known to correlate with many mechanical properties. Despite the importance of the elastic constant tensor, it has been measured for a very small fraction of all known inorganic compounds, a situation that limits the ability of materials scientists to develop new materials with targeted mechanical responses. To address this deficiency, we present here the largest database of calculated elastic properties for inorganic compounds to date. The database currently contains full elastic information for 1,181 inorganic compounds, and this number is growing steadily. The methods used to develop the database are described, as are results of tests that establish the accuracy of the data. In addition, we document the database format and describe the different ways it can be accessed and analyzed in efforts related to materials discovery and design.