Choosing and downloading weather and climate data products#
The following are incredibly useful resources to keep in mind while working with weather data:
- đ UCAR Climate Data Guide
an encyclopedia for weather and climate data products with expert guidance on strengths and weaknesses for most commonly-used datasets
- đ Reanalysis.org
a forum and wiki for makers and users of reanalyses with a focus on evaluating data products and comparing them with observational data
- đ Reanalysis and Observational Datasets and Variables
a âwhoâs whoâ of historical weather products with basic facts about each
These resources will help you determine which data product is right for you. They will also help you better interpret results from existing studies. (For example, the NCEP2 reanalysis data product, which was commonly used in economics and policy studies, has known issues including larger biases in the Southern Hemisphere).
The following section shows several examples of choosing and downloading weather data given a region and variable of interest. Though there are many more products than the three introduced here, each with their own download procedures and quirks, these examples show a few common setups that you may encounter (CHIRPS: ftp directory; BEST: website browser; ERA5: data storage system).
Generally, itâs good practice to first research which data products are appropriate for your variable and area of interest. Questions you should be thinking of include:
does the data product have the variable I need?
are the data available at the resolution I need?
are biases reasonable for the variable and region of interest?
The answers to the first two questions above are easily found on the website of each dataset. The third question is more complex - the UCAR Climate Data Guide introduced above is a good first place to look. A Google Scholar search of the form [data product name] validation OR evaluation OR bias OR uncertainty
may be useful as well.
Getting started with an observational data product: BEST and CHIRPS#
Say youâre looking at agriculture in Ethiopia. You would like both temperature and precipitation data (remember the warning on hydrological variables), and would like to use observational datasets. You consider BEST for temperature due to their daily output and CHIRPS, a hybrid station-satellite data product, for precipitation because you found literature specifically examining its biases in your region of interest.
CHIRPS |
BEST |
|
---|---|---|
1. Understand the Data Product |
CHIRPS is unfortunately not covered on the UCAR Climate Data Guide. However, you find several articles specifically validating it in Ethiopia (e.g., Dinku et al. 2018 or Gebrechorkos et al. 2018). You see that satellite data products are more biased South of the Rift Valley than North. You also see that CHIRPS tends to overestimate rainfall. You consider how these biases may affect your results. |
BEST is covered in the Climate Data Guide. You see that it is able to provide high-resolution data because it includes incomplete and partial station records that other global data products may throw out. However, you also see that the data is highly smoothed, meaning that it will likely be more biased in areas with large heterogeneity in temperature - for example in the mountainous highlands of Ethiopia. You resolve to use different sources to check for robustness. |
2. Prepare to Download the Data |
CHIRPS data is stored in a publicly accessible directory. You navigate to the |
Click on âGet Data (external)â on the Climate Data Guide website, taking you to Berkley Earthâs data overview page. You navigate down to the section on âGridded Dataâ. Youâll have to click on every decade separately, but without further ado, clean NetCDF files are being downloaded to your machine. |
3. Accessing the Data |
Unfortunately, the data is not in |
The filename, as is typical for observational datasets, is in its own format - so you might want to rename them into CMIP format just for ease of reading. By reading the NetCDF header, you note that the grid variables are stored as |
Note
Most weather products will require some bureaucracy (creating accounts, signing data agreements, etc.) to download data, and most have their own quirks about how they want data to be downloaded. CHIRPS and BEST do not require bureaucracy, but CHIRPS will require some scripting to download.
These datasets are stored in different geographical grids and will need to be regridded to a common grid, using tools like xesmf
in Python. See also weighting schemes.
Getting started with a reanalysis data product: ERA-5#
Say youâre studying heat waves in the Sahel. Weather station data is low, so you need a gridded data product. You consider ERA5, the most advanced modern reanalysis data product as of 2019, recently released by the European Centre for Medium-Range Weather Forecasting (ECMWF) (which incidentally also produces the worldâs most respected hurricane forecast model).
ERA-5 |
|
---|---|
1. Understand the Data Product |
|
2. Prepare to Download the Data |
Note GRIB is another meteorological data format - itâs less common and less flexible than NetCDF but slightly more efficient in storage. GRIB files can be converted easily to NetCDF files through command-line tools such as cdo. |
3. Accessing the Data |
|
Caution
Many datasets, especially those from smaller institutions, will not give up their secrets easily. Be prepared to have to deal with wget
scripts, jblob
scripts, writing ftp
scripts, and so forth, with well-meaning but poorly-written accompanying documentation. In some of these cases, it might be fastest to call up your best climate researcher friend, who may be able to just share their scripts with you.
Caution
Climate and weather data can be massive. For example, the full, hourly, global record of a set of 9 commonly-used near-surface variables in ERA5 (including temperature and preciptiation) comes out to roughly 7 TB of disk space in total. Consequently, data products tend to be saved in smaller chunks, or allow for subsetting before downloading. Depending on the scale of your analysis, you will likely need additional storage beyond your personal machine, on external servers, for example. More recently, some datasets have also been made available on cloud servers such as pangeo or Google Earth Engine.
Thinking ahead to climate projections#
Research linking social outcomes to weather variations often aim to project results into the future to estimate the impact of climate change on their variable of interest. We have chosen (at least for now) not to expand this guide to include information on climate projection because of its immense complexity. Oftentimes a more sophisticated understanding of how models work and their uncertainties is needed to avoid underestimating propagated uncertainties in your final estimates. Even more so than with weather data products, there is no right or correct climate model, or group of models to use (see e.g., Knutti 2010 or Collins 2017. Emissions scenarios, the response of the models to emissions scenarios, intermodel variability, and intra-model variability all add to the uncertainty in your projection, and their relative strength may depend on the timescale and aims of your study.
See also
To get started thinking about incorporating changes in climate into your analysis, we also recommend: On the use and misuse of climate change projections in international development, by Nissan et al. (2019) and Using Weather Data and Climate Model Output in Economic Analyses of Climate Change by Auffhammer et al. (2013)
However, if you plan to project results into the future, you can start thinking about its logistics now. Climate data comes from imperfect models whose raw output generally has to be âbias-correctedâ before being used in econometric or policy research contexts. Bias-correction involves using information from a weather dataset to inform the output of a climate model, either by applying model changes to the weather data (so-called âdelta-methodâ projection) or by adjusting the model output by applying a historical difference between the model and weather data to the future model output. We wonât go into details about these methods (like everything in this field, they have their strengths and weaknesses), but you should generally use data that has been bias-corrected to the same weather data set you are using to inform your econometric model. Oftentimes this bias-correction is still conducted by the end user themselves, but some pre-bias-corrected climate projections exist. For example, NASAâs NEX-GDDP dataset is bias-corrected to the Global Meteorological Forcing Dataset (GMFD) for Land Surface Modeling historical dataset.
A quick summarizing note#
This process may seem overwhelming, especially given the large variety of data products that exist, and the sometimes rather opaque processes for figuring out what works best.
If a regional observational dataset exists for the region and variables you wish to examine, you should start off with them. Alternatively, you may use a well-understood global observational dataset. Donât use a dataset or a data assimilation methodology just because previous work (even big-name papers) have used them. There are enough examples in the literature of problematic uses of weather and climate data (for examples of discussions about these issues, see Fisher et al. 2012 and Burke et al. 2015).
Furthermore, check your results with multiple datasets from the latest generation! Consider performing your analysis with a purely station-based dataset and one that includes satellite data; or compare results to those from a reanalysis dataset if you are worried about statistical interpolation in your region of interest. This may not make a huge difference for more stable variables in areas with high station coverage (e.g., temperature in North America), but could be a useful robustness check for more problematic ones (e.g., precipitation). If the choice of âhistoricalâ dataset changes your results, think about how their biases may interact with your analysis to figure out whatâs causing the discrepancy.