Hands-On Exercise, Step 3: Aggregating the data¶
Aggregating the weather data¶
One of the tutorial authors has produced a new tool for aggregating
gridded data according to a shapefile: xagg
. Since it handles
partial grid-cell overlapping and arbitrary weighting grid, we
recommend that it be used irrespective of the language being used
elsewhere. Install it according to the instructions from GitHub.
You will need to download a shapefile for the US counties. We use one from the ESRI community website: https://community.esri.com/ccqpr47374/attachments/ccqpr47374/enterprise-portal/4124/1/UScounties.zip
Save its contents to a folder geo_data
in the data
directory. The
following code should be run from the data
directory.
Setting up the environment¶
The approach uses xarray
for gridded data, geopandas
to work with shapefiles, and xagg
to aggregate gridded data onto shapefiles.
import xarray as xr
import geopandas as gpd
import xagg as xa
import numpy as np
First, load the data from the previous steps:
# Load temperature data using xarray
ds_tas = xr.open_dataset(
'climate_data/tas_day_BEST_historical_station_19800101-19891231.nc')
# Load population data using xarray
ds_pop = xr.open_dataset('pcount/usap90ag.nc')
# Load county shapefiles using geopandas
gdf_counties = gpd.read_file('geo_data/UScounties.shp')
Transforming the data¶
Next, we need to construct any nonlinear transformations of the data.
For our econometric model, we want temperature in both linear and quadratic form, centered around \(20^\circ\) C: \(T-20^\circ C\) and \(T^2 - (20^\circ C)^2\).
ds_tas['tas_adj'] = ds_tas.tas-20
ds_tas['tas_sq'] = ds_tas.tas**2 - 20**2
# xagg aggregates every gridded variable in ds_tas - however, we don't need
# every variable currently in tas. Let'ss drop "tas" (the un-adjusted temperature)
# and "land_mask" which is included, but not necessary for our further analysis.
ds_tas = ds_tas.drop('tas')
ds_tas = ds_tas.drop('land_mask')
Create map of pixels onto polygons¶
We need to create a weight map of pixels to polygons. For each polygon, we need to know which pixels overlap it and by how much.
xagg
does this by creating polygons for each pixel in the gridded
dataset, taking the intersect between each county polygon and all
pixel polygons, and calculating the average area of overlap between
the pixels that touch the polygon and the polygon.
weightmap =
xa.pixel_overlaps(ds_tas,gdf_counties,weights=ds_pop.Population,subset_bbox=False)
Aggregate values onto polygons¶
Using the weight map calculated above, xagg
now aggregates all the gridded variables in ds_tas
(the tas_adj
and tas_sq
we calculated above) onto the county polygons. For each county, the weighted average of the temperatures of the pixels that cover the county is calculated. Since we included population as a desired weight above, the weight for each pixel is a combination of how much of the pixel overlaps with the county and its population density. In other words, a pixel that covers more of a county and has a higher population density will be weighted more in that county’s temperature average than a pixel that covers less of a county and is more sparsely populated.
The output of this function now gives, for each county, a 10-year time series of linear and quadratic temperature, properly area- and population-weighted.
aggregated
is an object specific to the xagg
package. We need to modify it to be usable, for example using aggregated.to_dataset()
or aggregated.to_csv
. See the xagg
docs for more info.
aggregated = xa.aggregate(ds_tas, weightmap)
Export this as a .csv
file to be used elsewhere¶
Finally, we need to export this data to be used elsewhere. For this tutorial, we want to allow a variety of tools, so we’ll
export the aggregated data into a .csv
(comma-separated value) file,
which can easily be read by R, STATA, and most other programming
tools. The data will be reshaped ‘wide’ by xagg
- so every row is a
county, and every column is a timestep of the variables.
Use aggregated.to_dataset()
or aggregated.to_dataframe()
, depending on whether you’d like to continue using it in xarray
or pandas
.
aggregated.to_csv('climate_data/agg_vars.csv')