Introduction to Xarray

Introduction to Xarray#

Overview#

This notebook will introduce the basics of gridded, labeled data with Xarray. Since Xarray introduces additional abstractions on top of plain arrays of data, our goal is to show why these abstractions are useful and how they frequently lead to simpler, more robust code.

We’ll cover these topics:

Create a DataArray, one of the core object types in Xarray
Understand how to use named coordinates and metadata in a DataArray
Combine individual DataArrays into a Dataset, the other core object type in Xarray
Subset, slice, and interpolate the data using named coordinates
Open netCDF data using XArray
Basic subsetting and aggregation of a Dataset
Brief introduction to plotting with Xarray

Prerequisites#

Concepts	Importance	Notes
NumPy Basics	Necessary
Intermediate NumPy	Helpful	Familiarity with indexing and slicing arrays
NumPy Broadcasting	Helpful	Familiar with array arithmetic and broadcasting
Introduction to Pandas	Helpful	Familiarity with labeled data
Datetime	Helpful	Familiarity with time formats and the `timedelta` object
Understanding of NetCDF	Helpful	Familiarity with metadata structure

Time to learn: 30 minutes

Imports#

Simmilar to numpy, np; pandas, pd; you may often encounter xarray imported within a shortened namespace as xr. pythia_datasets provides example data for us to work with.

from datetime import timedelta

import numpy as np
import pandas as pd
import xarray as xr

Plotting with Xarray#

Another major benefit of using labeled data structures is that they enable automated plotting with sensible axis labels.

Simple visualization with `.plot()`#

Much like we saw in Pandas, Xarray includes an interface to Matplotlib that we can access through the .plot() method of every DataArray.

For quick and easy data exploration, we can just call .plot() without any modifiers:

prof.plot()

[<matplotlib.lines.Line2D at 0x17feb5f60>]

../../../_images/50bbb86fccea832e332b9646540e9541722d211e10a36e51d028733e5ee3dde9.png

Here Xarray has generated a line plot of the temperature data against the coordinate variable isobaric. Also the metadata are used to auto-generate axis labels and units.

Customizing the plot#

As in Pandas, the .plot() method is mostly just a wrapper to Matplotlib, so we can customize our plot in familiar ways.

In this air temperature profile example, we would like to make two changes:

swap the axes so that we have isobaric levels on the y (vertical) axis of the figure
make pressure decrease upward in the figure, so that up is up

A few keyword arguments to our .plot() call will take care of this:

prof.plot(y="isobaric1", yincrease=False)

[<matplotlib.lines.Line2D at 0x17ffae830>]

../../../_images/68199ca6f7c6203e2caf420995b481ba8ebdac49e5149c1bd970ba9cc106f2fe.png

Plotting 2D data#

In the example above, the .plot() method produced a line plot.

What if we call .plot() on a 2D array?

temps.sel(isobaric1=1000).plot()

<matplotlib.collections.QuadMesh at 0x18002a0b0>

../../../_images/64e3d0d38063524c2fca4cafb19e7f920fcd08634afb6b9b86dfb25f6f77c93a.png

Xarray has recognized that the DataArray object calling the plot method has two coordinate variables, and generates a 2D plot using the pcolormesh method from Matplotlib.

In this case, we are looking at air temperatures on the 1000 hPa isobaric surface over North America. We could of course improve this figure by using Cartopy to handle the map projection and geographic features!

Summary#

Xarray brings the joy of Pandas-style labeled data operations to N-dimensional data. As such, it has become a central workhorse in the geoscience community for the analysis of gridded datasets. Xarray allows us to open self-describing NetCDF files and make full use of the coordinate axes, labels, units, and other metadata. By making use of labeled coordinates, our code is often easier to write, easier to read, and more robust.

What’s next?#

Additional notebooks to appear in this section will go into more detail about

arithemtic and broadcasting with Xarray data structures
using “group by” operations
remote data access with OpenDAP
more advanced visualization including map integration with Cartopy

Resources and references#

This notebook was adapated from material in Unidata’s Python Training.

The best resource for Xarray is the Xarray documentation. See in particular

Another excellent resource is this Xarray Tutorial collection.

Introduction to Xarray

Contents

Introduction to Xarray#

Overview#

Prerequisites#

Imports#

Introducing the `DataArray` and `Dataset`#

Creation of a `DataArray` object#

Generate a random numpy array#

Wrap the array: first attempt#

Assign dimension names#

Create a `DataArray` with named Coordinates#

Make time and space coordinates#

Initialize the `DataArray` with complete coordinate info#

Set useful attributes#

Attributes are not preserved by default!#

The `Dataset`: a container for `DataArray`s with shared coordinates#

Create a pressure `DataArray` using the same coordinates#

Create a `Dataset` object#

Access Data variables and Coordinates in a `Dataset`#

Subsetting and selection by coordinate values#

NumPy-like selection#

Selecting with `.sel()`#

Approximate selection and interpolation#

Nearest-neighbor sampling#

Interpolation#

Slicing along coordinates#

One more selection method: `.loc`#

Opening netCDF data#

Access netCDF data with `xr.open_dataset`#

Subsetting the `Dataset`#

Aggregation operations#

Plotting with Xarray#

Simple visualization with `.plot()`#

Customizing the plot#

Plotting 2D data#

Summary#

What’s next?#

Resources and references#

Introduction to Xarray

Contents

Introduction to Xarray#

Overview#

Prerequisites#

Imports#

Introducing the DataArray and Dataset#

Creation of a DataArray object#

Generate a random numpy array#

Wrap the array: first attempt#

Assign dimension names#

Create a DataArray with named Coordinates#

Make time and space coordinates#

Initialize the DataArray with complete coordinate info#

Set useful attributes#

Attributes are not preserved by default!#

The Dataset: a container for DataArrays with shared coordinates#

Create a pressure DataArray using the same coordinates#

Create a Dataset object#

Access Data variables and Coordinates in a Dataset#

Subsetting and selection by coordinate values#

NumPy-like selection#

Selecting with .sel()#

Approximate selection and interpolation#

Nearest-neighbor sampling#

Interpolation#

Slicing along coordinates#

One more selection method: .loc#

Opening netCDF data#

Access netCDF data with xr.open_dataset#

Subsetting the Dataset#

Aggregation operations#

Plotting with Xarray#

Simple visualization with .plot()#

Customizing the plot#

Plotting 2D data#

Summary#

What’s next?#

Resources and references#

Introducing the `DataArray` and `Dataset`#

Creation of a `DataArray` object#

Create a `DataArray` with named Coordinates#

Initialize the `DataArray` with complete coordinate info#

The `Dataset`: a container for `DataArray`s with shared coordinates#

Create a pressure `DataArray` using the same coordinates#

Create a `Dataset` object#

Access Data variables and Coordinates in a `Dataset`#

Selecting with `.sel()`#

One more selection method: `.loc`#

Access netCDF data with `xr.open_dataset`#

Subsetting the `Dataset`#

Simple visualization with `.plot()`#