WEMC Tech Blog #4: Calculating EU Country Averages With ERA5 and NUTS

As in previous blogs, this focuses on the Python programming language. For information on installing Python and packages, please see previous technical blogs #1 and #2.

This example will look at how to mask ERA5 data from the European region, downloaded from ECMWF Climate Data Store (CDS). The procedure heavily relies on functions provided by the scitools Iris package, although similar results could be obtained using a combination of packages like xarray and regionmask.

Import the following packages:

# import packages
import iris
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import iris.plot as iplt
import iris.quickplot as qplt
import iris.pandas
import os

Read in the NUTS (Nomenclature of Territorial Units for Statistics) shapefile using geopandas. NUTS shapefiles are obtained from eurostat, who provide various resolutions and formats to use in your code. For this example, a scale 1:20 Million is used, in the SHP (shapefile) format. Level 0 is the lowest resolution and is a collection of geometric polygons at country level.

# read the NUTS shapefile and extract the polygons for a individual countries

A quick check of the NUTS shapefile reveals the geometry values, with some associated identifiers.


Select a country by its abbreviation (aka ‘NUTS_ID’). For example, ‘FR’ will select the France shapefile.

# set country by NUTS_ID
country = nuts[nuts['NUTS_ID'] == 'FR']

Load in a NetCDF file using Iris. Iris has many built in functions for dealing with NetCDF. As discussed in previous blogs, NetCDF files are a file format for storing multidimensional data. Iris represents this data as ‘cubes’, a convenient way to manipulate NetCDF data, whilst keeping things cf compliant. Here, a NetCDF containing 1 year (2000) of hourly 2 metre temperature, covering the EU region, is loaded.

# get the latitude-longitude grid from netcdf file
cube=cube.intersection(longitude=(-180, 180))

Once loaded, a quick check of the cube shows it has three dimensions; time, latitude and longitude. With the variable 2 metre temperature (K).


Next, create a function to check whether the data is within the boundaries of the shapefile. Surprisingly, Iris does not have a function to do this out-of-the-box, so I borrowed this good example from a post on stack overflow (credit to user DPeterK).

# function to check whether data is within shapefile - source: http://bit.ly/2pKXnWa

def geom_to_masked_cube(cube, geometry, x_coord, y_coord,
    Convert a shapefile geometry into a mask for a cube's data.


    * cube:
        The cube to mask.
    * geometry:
        A geometry from a shapefile to define a mask.
    * x_coord: (str or coord)
        A reference to a coord describing the cube's x-axis.
    * y_coord: (str or coord)
        A reference to a coord describing the cube's y-axis.


    * mask_excludes: (bool, default False)
        If False, the mask will exclude the area of the geometry from the
        cube's data. If True, the mask will include *only* the area of the
        geometry in the cube's data.

    .. note::
        This function does *not* preserve lazy cube data.

    # Get horizontal coords for masking purposes.
    lats = cube.coord(y_coord).points
    lons = cube.coord(x_coord).points
    lon2d, lat2d = np.meshgrid(lons,lats)

    # Reshape to 1D for easier iteration.
    lon2 = lon2d.reshape(-1)
    lat2 = lat2d.reshape(-1)

    mask = []
    # Iterate through all horizontal points in cube, and
    # check for containment within the specified geometry.
    for lat, lon in zip(lat2, lon2):
        this_point = gpd.geoseries.Point(lon, lat)
        res = geometry.contains(this_point)

    mask = np.array(mask).reshape(lon2d.shape)
    if mask_excludes:
        # Invert the mask if we want to include the geometry's area.
        mask = ~mask
    # Make sure the mask is the same shape as the cube.
    dim_map = (cube.coord_dims(y_coord)[0],
    cube_mask = iris.util.broadcast_to_shape(mask, cube.shape, dim_map)

    # Apply the mask to the cube's data.
    data = cube.data
    masked_data = np.ma.masked_array(data, cube_mask)
    cube.data = masked_data
    return cube

The function can now be applied to the cube, in this case checking whether the data is within the France polygon.

# apply the geom_to_masked_cube function
geometry = country
masked_cube = geom_to_masked_cube(cube, geometry, 'longitude', 'latitude', mask_excludes=True)

Plotting an individual time slice from ‘masked_cube’, reveals the data is now successfully masked to the France region.

# plot to check area masking is correct
plt.figure(figsize=(20, 10),)
ax = plt.axes(projection=ccrs.PlateCarree())

# pick time slice and draw the contour with 100 levels + select appropriate cmap.
q = iris.plot.contourf(masked_cube[5000], 100, cmap='hot_r')

# add colour bar w/label
cb = plt.colorbar()
cb.set_label('2m Temperature in K')

# add coastlines to the map created by contourf.

# add stock background image

# add title
plt.title('Region masked by NUTS shapefile - 2001-07-27 08:00')

# Add a citation to the plot.
iplt.citation('Luke Sanger - WEMC (2018)')

# set map limits to show only EU
ax.set_extent([-30.0, 20.0, 30.0, 80.0], ccrs.PlateCarree())

# show plot

Before getting the area average, it’s important to note the current grid is a euclidean space, a flat grid of equal dimensions. To account for the curvature of the earth, create area weights utilising the Iris ‘cosine_latitude_weights’ function.

# use iris function to get area weights
grid_areas = iris.analysis.cartography.cosine_latitude_weights(masked_cube)

These weights can be used in conjunction with the ‘iris_analysis_MEAN’ function to collapse the cube into a 1-dimensional array.

new_cube = masked_cube.collapsed(['latitude','longitude'], iris.analysis.MEAN, weights=grid_areas)

Finally, in order to output this into a .csv file. First convert to a pandas series, then save as csv.

# convert to pandas series 
dfs = iris.pandas.as_series(new_cube)

# save as csv

Opening the .csv file should produce a dataset containing the time series and the area averages for that country.

In the future I will revisit this topic with a follow up blog looking at applying a land-sea mask to the dataset. This improves accuracy where land sea boundaries occupy grid areas and applies weights based on this.

An example dataset can be downloaded here

Complete ipython notebook code can be found on my github page here

by Luke Sanger (WEMC Data Engineer, 2018)

Recent Posts

Leave a Comment

Start typing and press Enter to search

COP24 Katowice 2018