Hands-On Exercise, Step 3: Aggregating the data¶
Aggregating the weather data¶
One of the tutorial authors has produced a new tool for aggregating
gridded data according to a shapefile:
xagg. Since it handles
partial grid-cell overlapping and arbitrary weighting grid, we
recommend that it be used irrespective of the language being used
elsewhere. Install it according to the instructions from GitHub.
You will need to download a shapefile for the US counties. We use one from the ESRI community website: https://community.esri.com/ccqpr47374/attachments/ccqpr47374/enterprise-portal/4124/1/UScounties.zip
Save its contents to a folder
geo_data in the
data directory. The
following code should be run from the
Setting up the environment¶
The approach uses
xarray for gridded data,
geopandas to work with shapefiles, and
xagg to aggregate gridded data onto shapefiles.
import xarray as xr import geopandas as gpd import xagg as xa import numpy as np
First, load the data from the previous steps:
# Load temperature data using xarray ds_tas = xr.open_dataset( 'climate_data/tas_day_BEST_historical_station_19800101-19891231.nc') # Load population data using xarray ds_pop = xr.open_dataset('pcount/usap90ag.nc') # Load county shapefiles using geopandas gdf_counties = gpd.read_file('geo_data/UScounties.shp')
Transforming the data¶
Next, we need to construct any nonlinear transformations of the data.
For our econometric model, we want temperature in both linear and quadratic form, centered around \(20^\circ\) C: \(T-20^\circ C\) and \(T^2 - (20^\circ C)^2\).
ds_tas['tas_adj'] = ds_tas.tas-20 ds_tas['tas_sq'] = ds_tas.tas**2 - 20**2 # xagg aggregates every gridded variable in ds_tas - however, we don't need # every variable currently in tas. Let'ss drop "tas" (the un-adjusted temperature) # and "land_mask" which is included, but not necessary for our further analysis. ds_tas = ds_tas.drop('tas') ds_tas = ds_tas.drop('land_mask')
Create map of pixels onto polygons¶
We need to create a weight map of pixels to polygons. For each polygon, we need to know which pixels overlap it and by how much.
xagg does this by creating polygons for each pixel in the gridded
dataset, taking the intersect between each county polygon and all
pixel polygons, and calculating the average area of overlap between
the pixels that touch the polygon and the polygon.
weightmap = xa.pixel_overlaps(ds_tas,gdf_counties,weights=ds_pop.Population,subset_bbox=False)
Aggregate values onto polygons¶
Using the weight map calculated above,
xagg now aggregates all the gridded variables in
tas_sq we calculated above) onto the county polygons. For each county, the weighted average of the temperatures of the pixels that cover the county is calculated. Since we included population as a desired weight above, the weight for each pixel is a combination of how much of the pixel overlaps with the county and its population density. In other words, a pixel that covers more of a county and has a higher population density will be weighted more in that county’s temperature average than a pixel that covers less of a county and is more sparsely populated.
The output of this function now gives, for each county, a 10-year time series of linear and quadratic temperature, properly area- and population-weighted.
aggregated is an object specific to the
xagg package. We need to modify it to be usable, for example using
aggregated.to_csv. See the
xagg docs for more info.
aggregated = xa.aggregate(ds_tas, weightmap)
Export this as a
.csv file to be used elsewhere¶
Finally, we need to export this data to be used elsewhere. For this tutorial, we want to allow a variety of tools, so we’ll
export the aggregated data into a
.csv (comma-separated value) file,
which can easily be read by R, STATA, and most other programming
tools. The data will be reshaped ‘wide’ by
xagg - so every row is a
county, and every column is a timestep of the variables.
aggregated.to_dataframe(), depending on whether you’d like to continue using it in