Accessing and using the US GHG Center Data Catalog

The US GHG Center uses a Spatio-Temporal Asset Catalog (STAC) to catalog it listing of datasets. This tutorial teaches the basics of searching and accessing datasets using the US GHG Center STAC catalog.

Approach

  • Use the pystac_client library to connect to the data catalog
  • List all the collections from the catalog
  • Look at the items in one of the collections
  • Access an asset in the item
  • Use the US GHG Center TiTiler API to visualize the asset

Libraries Used

PySTAC Client

From the PySTAC Client Documentation:

The STAC Python Client (pystac_client) is a Python package for working with STAC Catalogs and APIs that conform to the STAC and STAC API specs in a seamless way.
PySTAC Client builds upon PySTAC through higher-level functionality and ability to leverage STAC API search endpoints.

We will use it to interact with the data catalog.

Requests

Requests (requests) is a simple HTTP library. Requests allows you to send HTTP requests extremely easily.

We will use it to make API requests.

Here we import all the required libraries and modules.

import requests

from pystac_client import Client

# For displaying image in a jupyter notebook
from IPython.display import Image, display

Let’s define the data catalog (STAC catalog) API URL

# STAC API root URL
URL = 'https://ghg.center/api/stac'

Use the pystac_client.Client module to connect to the catalog

catalog = Client.open(URL)
catalog

List all the datasets (collections) in the catalog

collections = catalog.get_collections()
collections = list(collections)
for collection in collections:
    # print(f">> {collection.id}: \n- {collection.description}\n")
    
    print(">> " + collection.id)
    print("- " + collection.description + "\n")
>> epa-ch4emission-yeargrid-v2express-new
- This gridded dataset represents an update to the original version 1 of the gridded GHGI from Maasakkers, et al., (2016). The annual files contain one year of emissions per source category but include a time dimension to make them suitable (COARDS-compliant) for atmospheric models. This main dataset also includes monthly source-specific methane emission scaling factors for those select sources with strong interannual variability.

>> lpjwsl-wetlandch4-monthgrid-v1
- Wetland methane emissions produced by the Lund–Potsdam–Jena Dynamic Global Vegetation Model (LPJ-DGVM) Wald Schnee und Landscaft version (LPJ-wsl). LPJ-wsl is a prognostic model used to simulate future changes in wetland emissions and independently verified with remote sensing data products. The LPJ-wsl model is regularly used in conjunction with NASA’s GEOS model to simulate the impact of wetlands and other methane sources on atmospheric methane concentrations.

>> tm54dvar-ch4flux-monthgrid-v1
- Global, monthly 1 degree resolution methane emission estimates from microbial, fossil and pyrogenic sources derived using inverse modeling, version 1.

>> gosat-based-ch4budget-yeargrid-v1
- Annual methane emissions gridded globally at 1° resolution for 2019, version.

>> sedac-popdensity-yeargrid5yr-v4.11
- The Gridded Population of the World, Version 4 (GPWv4): Population Density, Revision 11 consists of estimates of human population density (number of persons per square kilometer) based on counts consistent with national censuses and population registers, for the years 2000, 2005, 2010, 2015, and 2020. 

>> tm54dvar-ch4flux-mask-monthgrid-v1
- Global, monthly 1 degree resolution methane emission estimates from microbial, fossil and pyrogenic sources derived using inverse modeling, version 1.

>> casagfed-carbonflux-monthgrid-v3
- This product provides Monthly average Net Primary Production (NPP), heterotrophic respiration (Rh), wildfire emissions (FIRE), and fuel wood burning emissions (FUEL) derived from the Carnegie-Ames-Stanford-Approach – Global Fire Emissions Database version 3 (CASA-GFED3) model.

>> oco2geos-co2-daygrid-v10r
- Daily, global 0.5 x 0.625 degree assimilated CO2 concentrations derived from OCO-2 satellite data, version 10r

>> lpjwsl-wetlandch4-daygrid-v1
- Wetland methane emissions produced by the Lund–Potsdam–Jena Dynamic Global Vegetation Model (LPJ-DGVM) Wald Schnee und Landscaft version (LPJ-wsl). LPJ-wsl is a prognostic model used to simulate future changes in wetland emissions and independently verified with remote sensing data products. The LPJ-wsl model is regularly used in conjunction with NASA’s GEOS model to simulate the impact of wetlands and other methane sources on atmospheric methane concentrations.

>> epa-ch4emission-yeargrid-v2express
- This gridded dataset represents an update to the original version 1 of the gridded GHGI from Maasakkers, et al., (2016). The annual files contain one year of emissions per source category but include a time dimension to make them suitable (COARDS-compliant) for atmospheric models. This main dataset also includes monthly source-specific methane emission scaling factors for those select sources with strong interannual variability.

>> emit-ch4plume-v1
- Methane plume complexes from point source emitters

>> oco2-mip-meanco2budget-yeargrid-v1
- National CO2 Budgets (2015-2020) inferred from atmospheric CO2 observations in support of the Global Stocktake

>> oco2-mip-co2budget-yeargrid-v1
- National CO2 Budgets (2015-2020) inferred from atmospheric CO2 observations in support of the Global Stocktake

>> eccodarwin-co2flux-monthgrid-v5
- Global, monthly average air-sea CO2 flux at ~1/3° resolution from 2020 to 2022

>> odiac-ffco2-monthgrid-v2022
- The Open-Data Inventory for Anthropogenic Carbon dioxide (ODIAC) is a high-spatial resolution global emission data product of CO₂ emissions from fossil fuel combustion (Oda and Maksyutov, 2011). ODIAC pioneered the combined use of space-based nighttime light data and individual power plant emission/location profiles to estimate the global spatial extent of fossil fuel CO₂ emissions. With the innovative emission modeling approach, ODIAC achieved the fine picture of global fossil fuel CO₂ emissions at a 1x1km.

Pick a collection to interact with. Let’s pick eccodarwin-co2flux-monthgrid-v5. This dataset represents the Air-Sea CO₂ flux, estimated using the ECCO (Estimating the Circulation and Climate of the Ocean) Darwin model.

Read more about the dataset in the GHG Center Web Portal: https://earth.gov/ghgcenter/data-catalog/eccodarwin-co2flux-monthgrid-v5

collection = catalog.get_collection("eccodarwin-co2flux-monthgrid-v5")
collection

Let’s look at the items within the collection, using the CollectionClient.get_items method

# get_items
items = list(collection.get_items())
for item in items[:10]:
    print(item)
<Item id=eccodarwin-co2flux-monthgrid-v5-202212>
<Item id=eccodarwin-co2flux-monthgrid-v5-202211>
<Item id=eccodarwin-co2flux-monthgrid-v5-202210>
<Item id=eccodarwin-co2flux-monthgrid-v5-202209>
<Item id=eccodarwin-co2flux-monthgrid-v5-202208>
<Item id=eccodarwin-co2flux-monthgrid-v5-202207>
<Item id=eccodarwin-co2flux-monthgrid-v5-202206>
<Item id=eccodarwin-co2flux-monthgrid-v5-202205>
<Item id=eccodarwin-co2flux-monthgrid-v5-202204>
<Item id=eccodarwin-co2flux-monthgrid-v5-202203>
item = collection.get_item(items[0].id)   # item = collection.get_item("eccodarwin-co2flux-monthgrid-v5-202212")
item
asset = item.assets["co2"].href
asset
's3://ghgc-data-store/eccodarwin-co2flux-monthgrid-v5/ECCO-Darwin_CO2_flux_202212.tif'

Visualize this asset using TiTiler

# Define the TiTiler url
TITILER_URL = "https://ghg.center/api/raster"
# TiTiler preview endpoint
cog_preview = f"/cog/preview.png?url={asset}&rescale=-0.0007,0.0002&colormap_name=bwr"

User requests.get to make a GET request for the preview

response = requests.get(f"{TITILER_URL}{cog_preview}") #https://ghg.center/api/raster/cog/preview.png?url=s3://ghgc-data-store/eccodarwin-co2flux-monthgrid-v5/ECCO-Darwin_CO2_flux_202212.tif&rescale=-0.0007,0.0002&colormap_name=bwr
response
<Response [200]>
display(Image(response.content))

Searching for datasets based on the Area of Interest (AOI) and/or datetime

# Rought AOI for Baltimore, MD area
baltimore_aoi = {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -76.7413596126004,
          39.37730408865011
        ],
        [
          -76.7413596126004,
          39.20129583511198
        ],
        [
          -76.47249934044682,
          39.20129583511198
        ],
        [
          -76.47249934044682,
          39.37730408865011
        ],
        [
          -76.7413596126004,
          39.37730408865011
        ]
      ]
    ],
}
# Search the catalog for the given collection, aoi and datetime
search = catalog.search(
    max_items = 100,
    # limit = 5,
    collections = "epa-ch4emission-yeargrid-v2express",
    intersects = baltimore_aoi,
    datetime = "2012-04-01/2015-12-31",
)
items = list(search.item_collection())

len(items)
100
items[:5]
[<Item id=oco2geos-co2-daygrid-v10r-20151231>,
 <Item id=lpjwsl-wetlandch4-daygrid-v1-20151231>,
 <Item id=oco2geos-co2-daygrid-v10r-20151230>,
 <Item id=lpjwsl-wetlandch4-daygrid-v1-20151230>,
 <Item id=oco2geos-co2-daygrid-v10r-20151229>]

Summary

In this notebook, we used the pystac_client library to interact with the US GHG Center Data Catalog. We learned how to list all the datasets, list all the items in a dataset, read the metadata for the dataset/items, access the data file, visualize it and finally search the catalog for specific area of interest and datetime.