Daily and Annual PM2.5, O3, and NO2 Concentrations at ZIP Codes for the Contiguous U.S., 2000-2016, v1.0

The Daily and Annual PM2.5, O3, and NO2 Concentrations at ZIP Codes for the Contiguous U.S., 2000-2016, v1.0 data set contains daily and annual concentration predictions for Fine Particulate Matter (PM2.5), Ozone (O3), and Nitrogen Dioxide (NO2) pollutants at ZIP Code-level for the years 2000 to 2016. Ensemble predictions of three machine-learning models were implemented (Random Forest, Gradient Boosting, and Neural Network) to estimate the daily PM2.5, O3, and NO2 at the centroids of 1km x 1km grid cells across the contiguous U.S. for 2000 to 2016. The predictors included air monitoring data, satellite aerosol optical depth, meteorological conditions, chemical transport model simulations, and land-use variables. The ensemble models demonstrated excellent predictive performance with 10-fold cross-validated R-squared values of 0.86 for PM2.5, 0.86 for O3, and 0.79 for NO2. These high-resolution, well-validated predictions allow for estimates of ZIP Code-level pollution concentrations with a high degree of accuracy. For general ZIP Codes with polygon representations, pollution levels were estimated by averaging the predictions of grid cells whose centroids lie inside the polygon of that ZIP Code; for other ZIP Codes such as Post Offices or large volume single customers, they were treated as a single point and predicted their pollution levels by assigning the predictions using the nearest grid cell. The polygon shapes and points with latitudes and longitudes for ZIP Codes were obtained from Esri and the U.S. ZIP Code Database and were updated annually. The data include about 31,000 general ZIP Codes with polygon representations, and about 10,000 ZIP Codes as single points. The aggregated ZIP Code-level, daily predictions are applicable in research such as environmental epidemiology, environmental justice, health equity, and political science, by linking with ZIP Code-level demographic and medical data sets, including national inpatient care records, medical claims data, census data, U.S. Census Bureau American CommUnity Survey (ACS), and Area Deprivation Index (ADI). The data are particularly useful for studies on rural populations who are under-represented due to the lack of air monitoring sites in rural areas. Compared with the 1km grid data, the ZIP Code-level predictions are much smaller in size and are manageable in personal computing environments. This greatly improves the inclusion of scientists in different fields by lowering the key barrier to participation in air pollution research. The Units are ug/m^3 for PM2.5 and ppb for O3 and NO2.

Data and Resources

Additional Info

Field Value
Maintainer undefined
Last Updated April 23, 2025, 19:54 (UTC)
Created April 23, 2025, 19:54 (UTC)
accessLevel public
bureauCode {026:00}
catalog_@context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
catalog_@id https://data.nasa.gov/data.json
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
citation Wei, Y., X. Xing, A. Shtein, E. Castro, C. Hultquist, M. D. Yazdi, L. Li, and J. Schwartz. 2022-12-09. Daily and Annual PM2.5, O3, and NO2 Concentrations at ZIP Codes for the Contiguous U.S., 2000-2016, v1.0. Version 1.00. Palisades, NY. Archived by National Aeronautics and Space Administration, U.S. Government, NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/10.7927/9yp5-hz11. https://doi.org/10.7927/9yp5-hz11.
creator Wei, Y., X. Xing, A. Shtein, E. Castro, C. Hultquist, M. D. Yazdi, L. Li, and J. Schwartz
graphic-preview-description Sample browse graphic of the data set.
graphic-preview-file https://sedac.ciesin.columbia.edu/downloads/maps/aqdh-pm2-5-o3-no2-concentrations-zipcode-contiguous-us-2000-2016/sedac-logo.jpg
harvest_object_id 870d4627-4e4e-4037-9a9d-f81b63ce2e22
harvest_source_id b37e5849-07d2-41cd-8bb6-c6e83fc98f2d
harvest_source_title DNG Legacy Data
identifier C2563727886-SEDAC
issued 2022-12-09
landingPage https://doi.org/10.7927/10.7927/9yp5-hz11
language {en-US}
metadata_type geospatial
modified 2022-12-09
programCode {026:001}
publisher SEDAC
references {https://doi.org/10.1164/rccm.202107-1596OC,https://doi.org/10.1038/s41586-021-04190-y,https://doi.org/10.1016/j.envres.2022.114636}
release-place Palisades, NY
resource-type Dataset
source_datajson_identifier true
source_hash 54b154eb264de31603dcf2b82692a6274fecfe05a9a3d1ef70e24a787ec08ec3
source_schema_version 1.1
spatial -180.0 17.0 -65.0 72.0
temporal 2000-01-01T00:00:00Z/2016-12-31T00:00:00Z
theme {AQDH,geospatial}