Weighting schemes

Weighting schemes#

This section describes how to use different weighting schemes when working with gridded and regional data. As we use the term, weighting schemes assign a weight to each grid cell or regional observation. They have two major uses: (1) when running regressions, to weight observations to accurately estimate processes, or (2) to perform a weighted aggregation of gridded data to data regions.

Why spatial weighting schemes matter#

Taking the unweighted average of weather within a region can misrepresent what populations, firms, or other phenomena of interest are exposed to. For example, an unweighted annual average temperature for Canada is about -8°C, but most of the population and agricultural activity is in climate zones with mean temperatures over 6°C, and the urban heat island effect can raise temperatures by another 4°C. The time of year matters too, and you should consider a weighting scheme across days within a year, or even hours within a day.

As described in the section spatial and temporal scales of economic processes, the scale of a phenomenon matters. Many processes occur at a more local scale than that which data is collected. The motivation for weighting is different for aggregation that represents averaged phenomena vs. phenomena that respond to averaged weather, and the sequence of analysis changes accordingly.

When the phenomenon occurs locally, in response to local weather, we perform weighted aggregations to reflect the amount of the phenomenon in each location. For example, we would use population weighting to model the effects of heat on people. In this case, the order of operations is:

Transform weather into the terms of the model specification.
Average these transformed terms across space using a weighting scheme.

When the phenomenon occurs at a data region level, in response to averaged weather, the weighting scheme reflects the relative importance of weather in different regions to the whole. For example, weighting rainfall by the distance from ashore could be important to predict the declaration of states of emergency. The order of operations is:

Average the weather across space using a weighting scheme.
Transform the averaged weather to the model specification.

In either case, the weighting scheme is the same:

\[T_{it} = \sum_{p \in P(i)} w_p T_{pt} \text{ such that } \sum_p w_{p \in P(i)} = 1 \,\,\,\forall i\]

where \(w_p\) is the weight for pixel \(p\), and \(P(i)\) is the set of pixels in data region \(i\).

Where to get spatial weighting data#

Below are some common data sources for various weighting schemes.

📚 Population is an important weighting scheme for social impacts.
- Gridded Population of the World : Open-source, available at 30 arc-second resolution every 5 years from 2000 (or before with their previous version).
- LandScan : LandScan is available at 30 arc-second resolution, annually, but previous years need to be purchased. Ask at your institution, as many already have it.
📚 Gridded agriculture information
- Global Agricultural Lands in the Year 2000
- Also consider gridded land use datasets
📚 Look at the IRI Data Library for a large variety of datasets, available in any format.

You can get ADM0 and ADM1 pre-aggregated data, weighted according to a number of schemes (e.g., population, nightlights, cropland), from the Weighted Climate Data Repository.

If you need to generate your own aggregations, either of different weather variables/sources or other weighting schemes, the steps below provide a template.

Working with gridded weighting data#

Weighting data files come in a wide range of file formats, since any gridded data file can be used as a weighting scheme. The most common data types are CSV, ASC, GeoTIFF, and BIL files. In each case, you (or your code) need to know (1) the format of the data values, (2) the spatial gridding scheme, (3) the projection, and (4) how missing data is handled. These are described in the sections below.

The format of the data values#

Data values can be written out in text (as with CSV and ASC files) or in a binary representation (GeoTIFF and BIL). If the values are written as text, delimiters will be used to separate them (comma for CSV, spaces for ASC).

The spatial gridding scheme#

The spatial gridding scheme is determined by 6 numbers: a latitude and longitude of an origin point, meridional (north-south) and zonal (east-west) cell lengths, and a number of rows and columns. - The most common origin point is the location of the lower-left corner of the lower-left grid cell. For example, for a global dataset, that might be 90°S, 180°W, which is represented in x, y coordinates as (-180, -90). Sometimes (particularly with NetCDF files), grid cell center locations will be used instead. - Grid cell sizes are often given as decimal representation of fractions of a degree, such as \(0.0083333333333 = 1 / 120\) of a degree. This is the grid cell size needed globally to ensure a km-scale resolution. Usually, the meridional and zonal grid cell lengths are the same and are reported as a single number. - The number of grid cells is the most common way to describe the spatial coverage of the dataset. A global dataset will have 180 / cellsize rows and 360 / cellsize columns.

Based on this information, you can calculate which grid cell any point on the globe falls into:

\[\text{row} = \text{floor}\left(\frac{\text{Latitude} - y_0}{\text{CellSize}}\right),\]

\[\text{column} = \text{floor}\left(\frac{\text{Longitude} - x_0}{\text{CellSize}}\right)\]

where \(x_0, y_0\) is lower-left corner point. If the center of the lower-left cell was given, \(x_0 = x_\text{llcenter} - \frac{\text{CellSize}}{2}\), \(y_0 = y_\text{llcenter} - \frac{\text{CellSize}}{2}\).

For CSV files, you will need to keep track of this data yourself. ASC files have it at the top of the file, BIL files have a corresponding HDR file with the data, and GeoTIFF files have it embedded in the file which you can read with various software tools.

The geographic projection#

Projections are a way to map points on the globe (in latitude-longitude space) to a point in a flat x, y space. While this is important for visualizing maps, it can just be a nuisance for gridded datasets. The most common “projection” for gridded datasets is an equirectangular projection, and we have been assuming this above. This is variously referred to as 1, ll, WGS 84, and EPSG: 4326 (technically, WGS 84 specifies how latitude and longitude are defined, and EPSG:4326 specifies a drawing scheme where x = longitude and y = latitude). However, you will sometimes encounter grids in terms of km north and km east of a point, and then you may need to project these back to latitude-longitude and regrid them.

Handling of missing data#

All of these allow missing data to be handled. Typically, a specific numerical representation, like -9999, will be used. This is specified the same way that the gridding scheme is.

Implementation notes: Reading gridded data.#

R

Use the raster library. For example:

library(raster)
rr <- raster(filename)

Weighting schemes

Contents

Weighting schemes#

Why spatial weighting schemes matter#

Where to get spatial weighting data#

Working with gridded weighting data#

The format of the data values#

The spatial gridding scheme#

The geographic projection#

Handling of missing data#

Implementation notes: Reading gridded data.#

Aligning weather and weighting grids#

Upsampling - increasing data resolution#

Downsampling - decreasing data resolution#

Cropping - adjusting the extent of data#

Example#

Plotting your results#