Working with H3 data#

H3 is a popular icosahedral DGGS with hexagonal cells, developed and popularized by Uber. For more information, see https://h3geo.org. The tutorial aims to showcase how to work with H3 data using xdggs.

Import libraries#

import xarray as xr

import xdggs

_ = xr.set_options(display_expand_data=False)

Initialization#

To initialize, we first have to open the dataset. Here we’ll use xarray’s air_temperature tutorial dataset, which was interpolated to the H3 grid.

Tip

If the dataset you want to work on is not already on a H3 grid, you will have to use a different package to interpolate.

Warning

For the purpose of this tutorial we drop the geographic coordinates and load all data into memory, but this is not required.

original_ds = xdggs.tutorial.open_dataset("air_temperature", "h3").load()
air_temperature = original_ds.drop_vars(["lat", "lon"])
air_temperature
<xarray.Dataset> Size: 16MB
Dimensions:   (time: 2920, cells: 695)
Coordinates:
  * time      (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
    cell_ids  (cells) uint64 6kB 585508633488392191 ... 587389348127703039
Dimensions without coordinates: cells
Data variables:
    air       (time, cells) float64 16MB 246.3 247.5 237.3 ... 300.3 299.2 295.6
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

After that, we can use xdggs.decode() to tell xdggs to interpret the cell ids.

This will create a grid object (see xarray.Dataset.dggs.grid_info and xdggs.H3Info for more information) containing the grid parameters and a custom index for the cell_ids coordinate (notice how the coordinate name is displayed in bold), which will allow us to perform grid-aware operations.

Important

For this to work, the dataset has to have a coordinate called cell_ids, and it also has to have the grid_name and level attributes.

The grid_name refers to the short name of the grid, while level refers to the grid hierarchical level (the h3 libraries call this the β€œresolution”, while xdggs will use β€œlevel” for all grids).

In this case, the attributes on cell_ids are:

{
    "grid_name": "h3",
    "level": 2,
}
ds = air_temperature.pipe(xdggs.decode)
ds
<xarray.Dataset> Size: 16MB
Dimensions:   (time: 2920, cells: 695)
Coordinates:
  * time      (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
  * cell_ids  (cells) uint64 6kB 585508633488392191 ... 587389348127703039
Dimensions without coordinates: cells
Data variables:
    air       (time, cells) float64 16MB 246.3 247.5 237.3 ... 300.3 299.2 295.6
Indexes:
    cell_ids  H3Index(level=2)
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Deriving data#

With the grid object and the custom index, we can derive additional data from the cell ids.

Cell center coordinates#

For example, we can reconstruct the cell centers we dropped from the original dataset, using xarray.Dataset.dggs.cell_centers():

cell_centers = ds.dggs.cell_centers()
cell_centers
<xarray.Dataset> Size: 11kB
Dimensions:    (cells: 695)
Coordinates:
    latitude   (cells) float64 6kB 71.56 72.04 72.36 71.65 ... 17.98 17.7 17.52
    longitude  (cells) float64 6kB -78.06 -87.72 -122.4 ... -75.84 -92.88 -95.97
Dimensions without coordinates: cells
Data variables:
    *empty*

These are the same as the ones we dropped before:

derived_ds = ds.assign_coords(
    cell_centers.rename_vars({"latitude": "lat", "longitude": "lon"}).coords
)
derived_ds
<xarray.Dataset> Size: 16MB
Dimensions:   (time: 2920, cells: 695)
Coordinates:
  * time      (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
  * cell_ids  (cells) uint64 6kB 585508633488392191 ... 587389348127703039
    lat       (cells) float64 6kB 71.56 72.04 72.36 71.65 ... 17.98 17.7 17.52
    lon       (cells) float64 6kB -78.06 -87.72 -122.4 ... -75.84 -92.88 -95.97
Dimensions without coordinates: cells
Data variables:
    air       (time, cells) float64 16MB 246.3 247.5 237.3 ... 300.3 299.2 295.6
Indexes:
    cell_ids  H3Index(level=2)
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
xr.testing.assert_allclose(derived_ds, original_ds)

Cell boundary polygons#

Additionally, we can derive the cell boundary polygons as an array of Shapely using xarray.Dataset.dggs.cell_boundaries():

cell_boundaries = ds.dggs.cell_boundaries()
cell_boundaries
<xarray.DataArray (cells: 695)> Size: 6kB
POLYGON ((-76.58894477052608 73.22979877705764, -82.3342737067018 72.71199367...
Coordinates:
    cell_ids  (cells) uint64 6kB 585508633488392191 ... 587389348127703039
Dimensions without coordinates: cells

Plotting#

We can quickly visualize the data using xarray.DataArray.dggs.explore(), which is powered by lonboard.

Note

The slider requires a running kernel, so this won’t work in static documentation.

ds["air"].dggs.explore()