WEMC Tech Blog 4.5: Calculating NUTS2 Regional Averages with Land Sea Mask.

This post serves as a continuation on the techniques described in Tech Blog #4. So please familiarise with those steps beforehand.


As before, load in the NUTS shapefiles, this time selecting NUTS2.

NUTS2 is higher resolution and as a result, there are many more shapefiles. NUTS2 contains polygons at a regional (sub-country) level. In total there are 332 shapes for the Eurostat EU region.

Obtain the latest shapefiles from Eurostat.

Select a particular region with the 4 character NUTS_ID string.

For example, Stuttgart (DE):

region = nuts[nuts['NUTS_ID'] == 'DE11']


Land Sea Mask (LSM)

The land-sea mask is a field that contains, for every grid point, the proportion of land in the grid box. The values are between 0 (sea) and 1 (land).

You can obtain the ECMWF LSM NetCDF file here

A quick plot of the LSM gives this visual representation of the mask:

Calculate NUT2 Area Averages, with LSM

Now, following the same steps as described in Tech Blog #4, it is possible to calculate the area averages, with the LSM applied.

# contact luke.sanger@wemcouncil.org

# import packages
import iris
import geopandas as gpd
import numpy as np
import cartopy.crs as ccrs
import shapely
import iris.pandas
import iris.analysis.cartography
import os
import time

# place this file in directory where ERA5 files are located and define path below:
directory_in_str = '/data/private/wemc/100WS'

# read the NUTS shapefile and extract the polygons for a individual countries

# get the latitude-longitude grid from LSM file
cube2=cube2.intersection(longitude=(-180, 180))

# function to check whether data is within shapefile - source: http://bit.ly/2pKXnWa

def geom_to_masked_cube(cube, geometry, x_coord, y_coord,
    Convert a shapefile geometry into a mask for a cube's data.


    * cube:
        The cube to mask.
    * geometry:
        A geometry from a shapefile to define a mask.
    * x_coord: (str or coord)
        A reference to a coord describing the cube's x-axis.
    * y_coord: (str or coord)
        A reference to a coord describing the cube's y-axis.


    * mask_excludes: (bool, default False)
        If False, the mask will exclude the area of the geometry from the
        cube's data. If True, the mask will include *only* the area of the
        geometry in the cube's data.

    .. note::
        This function does *not* preserve lazy cube data.

    # Get horizontal coords for masking purposes.
    lats = cube.coord(y_coord).points
    lons = cube.coord(x_coord).points
    lon2d, lat2d = np.meshgrid(lons,lats)

    # Reshape to 1D for easier iteration.
    lon2 = lon2d.reshape(-1)
    lat2 = lat2d.reshape(-1)

    mask = []
    # Iterate through all horizontal points in cube, and
    # check for containment within the specified geometry.
    for lat, lon in zip(lat2, lon2):
        this_point = gpd.geoseries.Point(lon, lat)
        res = geometry.contains(this_point)

    mask = np.array(mask).reshape(lon2d.shape)
    if mask_excludes:
        # Invert the mask if we want to include the geometry's area.
        mask = ~mask
    # Make sure the mask is the same shape as the cube.
    dim_map = (cube.coord_dims(y_coord)[0],
    cube_mask = iris.util.broadcast_to_shape(mask, cube.shape, dim_map)

    # Apply the mask to the cube's data.
    data = cube.data
    masked_data = np.ma.masked_array(data, cube_mask)
    cube.data = masked_data
    return cube

directory = os.fsencode(directory_in_str)

# iterate through NUTS file
for i, row in nuts.iterrows():
    # select NUTS_ID as a string
    cntry = str(row.NUTS_ID)
    # assign row to variable based on NUTS_ID string
    geom = nuts[nuts['NUTS_ID'] == cntry]
    # loop through directory of NetCDF files
    for file in os.listdir(directory):
        filename = os.fsdecode(file)
        if filename.endswith(".nc"):

            # get the latitude-longitude grid from netcdf file
            cube=cube.intersection(longitude=(-180, 180))

            # apply the geom_to_masked_cube function
            geometry = geom
            masked_cube = geom_to_masked_cube(cube, geometry, 'longitude', 'latitude', mask_excludes=True)
            # apply the geom_to_masked_cube function to the LSM
            masked_cube2 = geom_to_masked_cube(cube2, geometry, 'longitude', 'latitude', mask_excludes=True)

            # use iris function to get area weights
            grid_areas = iris.analysis.cartography.cosine_latitude_weights(masked_cube)
            grid_areaslsm = iris.analysis.cartography.cosine_latitude_weights(masked_cube2)

            grid_lsm = grid_areas * grid_areaslsm

            # create new time series cube with area averages
            new_cube = masked_cube.collapsed(['latitude','longitude'], iris.analysis.MEAN, weights=grid_lsm)

            # convert to series and retain the time series - use this!
            dfs = iris.pandas.as_series(new_cube, copy=True)

            #generate a new filename
            edit = str(filename)
            edit2 = edit.rstrip(".nc")
            csv_name = cntry + "_" + edit2 + ".csv"

            # save as csv

            # pause loop to save CPU drain
NUTS2 Area Average

As usual, you can also get the code from my GitHub page.

by Luke Sanger (WEMC Data Engineer, 2018)

Recent Posts

Leave a Comment

Start typing and press Enter to search