Array programming with NumPy

Nature - Tập 585 Số 7825 - Trang 357-362 - 2020
C. R. Harris1, K. Jarrod Millman2, Stéfan van der Walt2, Ralf Gommers3, Pauli Virtanen4, David Cournapeau5, Eric Wieser6, Julian Taylor7, Sebastian Berg8, Nathaniel J. Smith9, Robert Kern10, Matti Picus8, Stephan Hoyer11, M. H. van Kerkwijk12, Matthew Brett13, Allan Haldane14, Jaime Fernández del Río15, Mark Wiebe16, Pearu Peterson3, P Gerard-Marchant17, Kevin Sheppard18, Tyler Reddy19, Warren Weckesser8, Hameer Abbasi3, Christoph Gohlke20, Travis E. Oliphant3
1Independent researcher, Logan, UT, USA
2Brain Imaging Center, University of California, Berkeley, Berkeley, CA, USA
3Quansight, Austin, TX, USA
4Department of Physics, University of Jyväskylä, Jyväskylä, Finland
5Mercari JP, Tokyo, Japan
6Department of Engineering, University of Cambridge, Cambridge, UK
7Independent Researcher, Karlsruhe, Germany
8Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, USA
9Independent researcher, Berkeley, CA, USA
10Enthought, Austin, TX, USA
11Google Research, Mountain View, CA USA
12Department of Astronomy and Astrophysics, University of Toronto, Toronto, Ontario, Canada
13School of Psychology, University of Birmingham, Edgbaston, Birmingham, UK
14Department of Physics, Temple University, Philadelphia, PA, USA
15Google, Zurich, Switzerland
16Department of Physics and Astronomy, The University of British Columbia, Vancouver, British Columbia, Canada
17Department of Biological and Agricultural Engineering, University of Georgia, Athens, GA, USA
18Department of Economics, University of Oxford, Oxford, UK
19CCS-7, Los Alamos National Laboratory, Los Alamos, NM, USA
20Laboratory for Fluorescence Dynamics, Biomedical Engineering Department, University of California, Irvine, Irvine, CA, USA

Tóm tắt

Abstract

Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.

Từ khóa


Tài liệu tham khảo

Abbott, B. P. et al. Observation of gravitational waves from a binary black hole merger. Phys. Rev. Lett. 116, 061102 (2016).

Chael, A. et al. High-resolution linear polarimetric imaging for the Event Horizon Telescope. Astrophys. J. 286, 11 (2016).

Dubois, P. F., Hinsen, K. & Hugunin, J. Numerical Python. Comput. Phys. 10, 262–267 (1996).

Ascher, D., Dubois, P. F., Hinsen, K., Hugunin, J. & Oliphant, T. E. An Open Source Project: Numerical Python (Lawrence Livermore National Laboratory, 2001).

Yang, T.-Y., Furnish, G. & Dubois, P. F. Steering object-oriented scientific computations. In Proc. TOOLS USA 97. Intl Conf. Technology of Object Oriented Systems and Languages (eds Ege, R., Singh, M. & Meyer, B.) 112–119 (IEEE, 1997).

Greenfield, P., Miller, J. T., Hsu, J. & White, R. L. numarray: a new scientific array package for Python. In PyCon DC 2003 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.112.9899 (2003).

Oliphant, T. E. Guide to NumPy 1st edn (Trelgol Publishing, 2006).

Dubois, P. F. Python: batteries included. Comput. Sci. Eng. 9, 7–9 (2007).

Oliphant, T. E. Python for scientific computing. Comput. Sci. Eng. 9, 10–20 (2007).

Millman, K. J. & Aivazis, M. Python for scientists and engineers. Comput. Sci. Eng. 13, 9–12 (2011).

Pérez, F., Granger, B. E. & Hunter, J. D. Python: an ecosystem for scientific computing. Comput. Sci. Eng. 13, 13–21 (2011). Explains why the scientific Python ecosystem is a highly productive environment for research.

Virtanen, P. et al. SciPy 1.0—fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020); correction 17, 352 (2020). Introduces the SciPy library and includes a more detailed history of NumPy and SciPy.

Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).

McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conf. (eds van der Walt, S. & Millman, K. J.) 56–61 (2010).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

van der Walt, S. et al. scikit-image: image processing in Python. PeerJ 2, e453 (2014).

van der Walt, S., Colbert, S. C. & Varoquaux, G. The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011). Discusses the NumPy array data structure with a focus on how it enables efficient computation.

Wang, Q., Zhang, X., Zhang, Y. & Yi, Q. AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs. In SC’13: Proc. Intl Conf. High Performance Computing, Networking, Storage and Analysis 25 (IEEE, 2013).

Xianyi, Z., Qian, W. & Yunquan, Z. Model-driven level 3 BLAS performance optimization on Loongson 3A processor. In 2012 IEEE 18th Intl Conf. Parallel and Distributed Systems 684–691 (IEEE, 2012).

Pérez, F. & Granger, B. E. IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9, 21–29 (2007).

Kluyver, T. et al. Jupyter Notebooks—a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas (eds Loizides, F. & Schmidt, B.) 87–90 (IOS Press, 2016).

Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. In Proc. 7th Python in Science Conf. (eds Varoquaux, G., Vaught, T. & Millman, K. J.) 11–15 (2008).

Astropy Collaboration et al. Astropy: a community Python package for astronomy. Astron. Astrophys. 558, A33 (2013).

Price-Whelan, A. M. et al. The Astropy Project: building an open-science project and status of the v2.0 core package. Astron. J. 156, 123 (2018).

Cock, P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

Millman, K. J. & Brett, M. Analysis of functional magnetic resonance imaging in Python. Comput. Sci. Eng. 9, 52–55 (2007).

The SunPy Community et al. SunPy—Python for solar physics. Comput. Sci. Discov. 8, 014009 (2015).

Hamman, J., Rocklin, M. & Abernathy, R. Pangeo: a big-data ecosystem for scalable Earth system science. In EGU General Assembly Conf. Abstracts 12146 (2018).

Chael, A. A. et al. ehtim: imaging, analysis, and simulation software for radio interferometry. Astrophysics Source Code Library https://ascl.net/1904.004 (2019).

Millman, K. J. & Pérez, F. Developing open source scientific practice. In Implementing Reproducible Research (eds Stodden, V., Leisch, F. & Peng, R. D.) 149–183 (CRC Press, 2014). Describes the software engineering practices embraced by the NumPy and SciPy communities with a focus on how these practices improve research.

van der Walt, S. The SciPy Documentation Project (technical overview). In Proc. 7th Python in Science Conf. (SciPy 2008) (eds Varoquaux, G., Vaught, T. & Millman, K. J.) 27–28 (2008).

Harrington, J. The SciPy Documentation Project. In Proc. 7th Python in Science Conference (SciPy 2008) (eds Varoquaux, G., Vaught, T. & Millman, K. J.) 33–35 (2008).

Harrington, J. & Goldsmith, D. Progress report: NumPy and SciPy documentation in 2009. In Proc. 8th Python in Science Conf. (SciPy 2009) (eds Varoquaux, G., van der Walt, S. & Millman, K. J.) 84–87 (2009).

Royal Astronomical Society Report of the RAS ‘A’ Awards Committee 2020: Astropy Project: 2020 Group Achievement Award (A) https://ras.ac.uk/sites/default/files/2020-01/Group%20Award%20-%20Astropy.pdf (2020).

Wilson, G. Software carpentry: getting scientists to write better code by making them more productive. Comput. Sci. Eng. 8, 66–69 (2006).

Hannay, J. E. et al. How do scientists develop and use scientific software? In Proc. 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering 1–8 (IEEE, 2009).

Millman, K. J., Brett, M., Barnowski, R. & Poline, J.-B. Teaching computational reproducibility for neuroimaging. Front. Neurosci. 12, 727 (2018).

Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) 8024–8035 (Neural Information Processing Systems, 2019).

Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In OSDI’16: Proc. 12th USENIX Conf. Operating Systems Design and Implementation (chairs Keeton, K. & Roscoe, T.) 265–283 (USENIX Association, 2016).

Chen, T. et al. MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Preprint at http://www.arxiv.org/abs/1512.01274 (2015).

Hoyer, S. & Hamman, J. xarray: N–D labeled arrays and datasets in Python. J. Open Res. Softw. 5, 10 (2017).

Entschev, P. Distributed multi-GPU computing with Dask, CuPy and RAPIDS. In EuroPython 2019 https://ep2019.europython.eu/media/conference/slides/fX8dJsD-distributed-multi-gpu-computing-with-dask-cupy-and-rapids.pdf (2019).

Behnel, S. et al. Cython: the best of both worlds. Comput. Sci. Eng. 13, 31–39 (2011).

Lam, S. K., Pitrou, A. & Seibert, S. Numba: a LLVM-based Python JIT compiler. In Proc. Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM ’15 7:1–7:6 (ACM, 2015).

Guelton, S. et al. Pythran: enabling static optimization of scientific Python programs. Comput. Sci. Discov. 8, 014001 (2015).

Dongarra, J., Golub, G. H., Grosse, E., Moler, C. & Moore, K. Netlib and NA-Net: building a scientific computing community. IEEE Ann. Hist. Comput. 30, 30–41 (2008).

Barrett, K. A., Chiu, Y. H., Painter, J. F., Motteler, Z. C. & Dubois, P. F. Basis System, Part I: Running a Basis Program—A Tutorial for Beginners UCRL-MA-118543, Vol. 1 (Lawrence Livermore National Laboratory 1995).

Dubois, P. F. & Motteler, Z. Basis System, Part II: Basis Language Reference Manual UCRL-MA-118543, Vol. 2 (Lawrence Livermore National Laboratory, 1995).

Chiu, Y. H. & Dubois, P. F. Basis System, Part III: EZN User Manual UCRL-MA-118543, Vol. 3 (Lawrence Livermore National Laboratory, 1995).

Chiu, Y. H. & Dubois, P. F. Basis System, Part IV: EZD User Manual UCRL-MA-118543, Vol. 4 (Lawrence Livermore National Laboratory, 1995).

Munro, D. H. & Dubois, P. F. Using the Yorick interpreted language. Comput. Phys. 9, 609–615 (1995).

Ihaka, R. & Gentleman, R. R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).

Iverson, K. E. A programming language. In Proc. 1962 Spring Joint Computer Conf. 345–351 (1962).

Jenness, T. et al. LSST data management software development practices and tools. In Proc. SPIE 10707, Software and Cyberinfrastructure for Astronomy V 1070709 (SPIE and International Society for Optics and Photonics, 2018).

Matsakis, N. D. & Klock, F. S. The Rust language. Ada Letters 34, 103–104 (2014).

Bezanson, J., Edelman, A., Karpinski, S. & Shah, V. B. Julia: a fresh approach to numerical computing. SIAM Rev. 59, 65–98 (2017).

Lattner, C. & Adve, V. LLVM: a compilation framework for lifelong program analysis and transformation. In Proc. 2004 Intl Symp. Code Generation and Optimization (CGO’04) 75–88 (IEEE, 2004).