# Import the following libraries
import requests
import folium
import folium.plugins
from folium import Map, TileLayer
from pystac_client import Client
import branca
import pandas as pd
import matplotlib.pyplot as plt
Air-Sea CO₂ Flux, ECCO-Darwin Model v5
Approach
- Identify available dates and temporal frequency of observations for the given collection using the GHGC API
/stac
endpoint. The collection processed in this notebook is the Air-Sea CO₂ Flux, ECCO-Darwin Model v5 Data product. - Pass the STAC item into the raster API
/stac/tilejson.json
endpoint. - Using
folium.plugins.DualMap
, we will visualize two tiles (side-by-side), allowing us to compare time points. - After the visualization, we will perform zonal statistics for a given polygon.
About the Data
The ocean is a major sink for atmospheric carbon dioxide (CO2), largely due to the presence of phytoplankton that use the CO₂ to grow. Studies have shown that global ocean CO₂ uptake has increased over recent decades, however there is uncertainty in the various mechanisms that affect ocean CO₂ flux and storage and how the ocean carbon sink will respond to future climate change. Because CO₂ fluxes can vary significantly across space and time, combined with deficiencies in ocean and atmosphere CO₂ observations, there is a need for models that can thoroughly represent these processes. Ocean biogeochemical models (OBMs) have the ability to resolve the physical and biogeochemical mechanisms contributing to spatial and temporal variations in air-sea CO₂ fluxes but previous OBMs do not integrate observations to improve model accuracy and have not been able to operate on the seasonal and multi-decadal timescales needed to adequately characterize these processes. The ECCO-Darwin model is an OBM that assimilates Estimating the Circulation and Climate of the Ocean (ECCO) consortium ocean circulation estimates and biogeochemical processes from the Massachusetts Institute of Technology (MIT) Darwin Project. A pilot study using ECCO-Darwin was completed by Brix et al. (2015) however an improved version of the model was developed by Carroll et al. (2020) in which issues present in the first model were addressed using data assimilation and adjustments were made to initial conditions and biogeochemical parameters. The updated ECCO-Darwin model was compared with interpolation-based products to estimate surface ocean partial pressure (pCO2) and air-sea CO₂ flux. This dataset contains the gridded global, monthly mean air-sea CO₂ fluxes from version 5 of the ECCO-Darwin model. The data are available at ~1/3° horizontal resolution at the equator (~18 km at high latitudes) from January 2020 through December 2022.
For more information regarding this dataset, please visit the Air-Sea CO₂ Flux ECCO-Darwin Model data overview page.
Install the Required Libraries
Required libraries are pre-installed on the GHG Center Hub. If you need to run this notebook elsewhere, please install them with this line in a code cell:
%pip install requests folium rasterstats pystac_client pandas matplotlib –quiet
Querying the STAC API
First, we are going to import the required libraries. Once imported, they allow better executing a query in the GHG Center Spatio Temporal Asset Catalog (STAC) Application Programming Interface (API) where the granules for this collection are stored.
# Provide the STAC and RASTER API endpoints
# The endpoint is referring to a location within the API that executes a request on a data collection nesting on the server.
# The STAC API is a catalog of all the existing data collections that are stored in the GHG Center.
= "http://ghg.center/api/stac"
STAC_API_URL
# The RASTER API is used to fetch collections for visualization
= "https://ghg.center/api/raster"
RASTER_API_URL
# The collection name is used to fetch the dataset from the STAC API. First, we define the collection name as a variable
# Name of the collection for ECCO Darwin CO₂ flux monthly emissions
= "eccodarwin-co2flux-monthgrid-v5" collection_name
# Fetch the collection from the STAC API using the appropriate endpoint
# The 'requests' library allows a HTTP request possible
= requests.get(f"{STAC_API_URL}/collections/{collection_name}").json()
collection
# Print the properties of the collection to the console
collection
Examining the contents of our collection
under the temporal
variable, we see that the data is available from January 2020 to December 2022. By looking at the dashboard:time density
, we observe that the data is periodic with monthly time density.
# Create a function that would search for a data collection in the US GHG Center STAC API
# First, we need to define the function
# The name of the function is "get_item_count"
# The argument that will be passed to the defined function is "collection_id"
def get_item_count(collection_id):
# Set a counter for the number of items existing in the collection
= 0
count
# Define the path to retrieve the granules (items) of the collection of interest (Air-Sea CO2 Flux ECCO-Darwin model) in the STAC API
= f"{STAC_API_URL}/collections/{collection_id}/items"
items_url
# Run a while loop to make HTTP requests until there are no more URLs associated with the collection (Air-Sea CO2 Flux ECCO-Darwin model) in the STAC API
while True:
# Retrieve information about the granules by sending a "get" request to the STAC API using the defined collection path
= requests.get(items_url)
response
# If the items do not exist, print an error message and quit the loop
if not response.ok:
print("error getting items")
exit()
# Return the results of the HTTP response as JSON
= response.json()
stac
# Increase the "count" by the number of items (granules) returned in the response
+= int(stac["context"].get("returned", 0))
count
# Retrieve information about the next URL associated with the collection (Air-Sea CO2 Flux ECCO-Darwin model) in the STAC API (if applicable)
next = [link for link in stac["links"] if link["rel"] == "next"]
# Exit the loop if there are no other URLs
if not next:
break
# Ensure the information gathered by other STAC API links associated with the collection are added to the original path
# "href" is the identifier for each of the tiles stored in the STAC API
= next[0]["href"]
items_url
# Return the information about the total number of granules found associated with the collection (Air-Sea CO2 Flux ECCO-Darwin model)
return count
# Apply the function created above "get_item_count" to the Air-Sea CO2 Flux ECCO-Darwin collection
= get_item_count(collection_name)
number_of_items
# Get the information about the number of granules found in the collection
= requests.get(f"{STAC_API_URL}/collections/{collection_name}/items?limit={number_of_items}").json()["features"]
items
# Print the total number of items (granules) found
print(f"Found {len(items)} items")
# Examine the first item in the collection
# Keep in mind that a list starts from 0, 1, 2... therefore items[0] is referring to the first item in the list/collection
0] items[
Exploring Changes in CO₂ Levels Using the Raster API
In this notebook, we will explore the global changes of CO₂ flux over time in urban regions. We will visualize the outputs on a map using folium
.
# Now we create a dictionary where the start datetime values for each granule is queried more explicitly by year and month (e.g., 2020-02)
= {item["properties"]["start_datetime"]: item for item in items}
items
# Next, we need to specify the asset name for this collection.
# The asset name is referring to the raster band containing the pixel values for the parameter of interest.
# For the case of the Air-Sea CO2 Flux ECCO-Darwin collection, the parameter of interest is “co2”.
= "co2" asset_name
Below, we are entering the minimum and maximum values to provide our upper and lower bounds in the rescale_values
.
# Fetch the minimum and maximum values for the CO2 value range
= {"max":0.0007, "min":-0.0007} rescale_values
Now, we will pass the item id, collection name, asset name, and the rescaling factor
to the Raster API
endpoint. This step is done twice so that we can visualize two arbitrary events independently.
# Choose a color map for displaying the first observation (event)
# Please refer to matplotlib library if you'd prefer choosing a different color ramp.
# For more information on Colormaps in Matplotlib, please visit https://matplotlib.org/stable/users/explain/colors/colormaps.html
= "magma"
color_map
# Make a GET request to retrieve information for the December 2022 tile which is the 1st item in the collection
# To retrieve the first item in the collection we use "0" in the "(items.keys())[0]" statement
# If you want to select another item (granule) in the list (collection), you can refer to the Data Browser in the U.S. Greenhouse Gas Center website
# URL to the Air-Sea CO2 Flux ECCO-Darwin collection in the US GHG Center: https://dljsq618eotzp.cloudfront.net/browseui/#eccodarwin-co2flux-monthgrid-v5/
# A GET request is made for the December 2022 tile
= requests.get(
december_2022_tile
# Pass the collection name, the item number in the list, and its ID
f"{RASTER_API_URL}/stac/tilejson.json?collection={items[list(items.keys())[0]]['collection']}&item={items[list(items.keys())[0]]['id']}"
# Pass the asset name
f"&assets={asset_name}"
# Pass the color formula and colormap for custom visualization
f"&color_formula=gamma+r+1.05&colormap_name={color_map}"
# Pass the minimum and maximum values for rescaling
f"&rescale={rescale_values['min']},{rescale_values['max']}",
# Return the response in JSON format
).json()
# Print the properties of the retrieved granule to the console
december_2022_tile
# Make a GET request to retrieve information for the April 2021 tile which is the 21th item in the collection
# To retrieve the 21st item in the collection we use "20" in the "(items.keys())[20]" statement
# Keep in mind that a list starts from 0, therefore "items[20]" is referring to the 21st item in the list/collection
# A GET request is made for the April 2021 tile
= requests.get(
april_2021_tile
# Pass the collection name, the item number in the list, and its ID
f"{RASTER_API_URL}/stac/tilejson.json?collection={items[list(items.keys())[20]]['collection']}&item={items[list(items.keys())[20]]['id']}"
# Pass the asset name
f"&assets={asset_name}"
# Pass the color formula and colormap for custom visualization
f"&color_formula=gamma+r+1.05&colormap_name={color_map}"
# Pass the minimum and maximum values for rescaling
f"&rescale={rescale_values['min']},{rescale_values['max']}",
# Return the response in JSON format
).json()
# Print the properties of the retrieved granule to the console
april_2021_tile
Visualizing CO₂ flux Emissions
# For this study we are going to compare the CO2 level in 2021 and 2022 along the coast of California
# To change the location, you can simply insert the latitude and longitude of the area of your interest in the "location=(LAT, LONG)" statement
# Set the initial zoom level and center of map for both tiles
# 'folium.plugins' allows mapping side-by-side
= folium.plugins.DualMap(location=(34, -118), zoom_start=6)
map_
# Define the first map layer with the CO2 Flux data for December 2022
= TileLayer(
map_layer_1 =december_2022_tile["tiles"][0], # Path to retrieve the tile
tiles="GHG", # Set the attribution
attr='December 2022 CO2 Flux', # Title for the layer
name=True, # The layer can be overlaid on the map
overlay=0.8, # Adjust the transparency of the layer
opacity
)# Add the first layer to the Dual Map
map_layer_1.add_to(map_.m1)
# Define the second map layer with the CO2 Flux data for April 2021
= TileLayer(
map_layer_2 =april_2021_tile["tiles"][0], # Path to retrieve the tile
tiles="GHG", # Set the attribution
attr='April 2021 CO2 Flux', # Title for the layer
name=True, # The layer can be overlaid on the map
overlay=0.8, # Adjust the transparency of the layer
opacity
)# Add the second layer to the Dual Map
map_layer_2.add_to(map_.m2)
# Display data markers (titles) on both maps
40, 5.0), tooltip="both").add_to(map_)
folium.Marker((
# Add a layer control to switch between map layers
=False).add_to(map_)
folium.LayerControl(collapsed
# Add a legend to the dual map using the 'branca' library
# Note: the inserted legend is representing the minimum and maximum values for both tiles
# Minimum value = -0.0007, maximum value = 0.0007
= branca.colormap.LinearColormap(colors=["#0000FF", "#3399FF", "#66CCFF", "#FFFFFF", "#FF66CC", "#FF3399", "#FF0000"], vmin=-0.0007, vmax=0.0007)
colormap
# Add the data unit as caption
= 'Millimoles per meter squared per second (mmol m²/s)'
colormap.caption
# Define custom tick values for the legend bar
= [-0.0007, -0.00035, 0, 0.00035, 0.0007]
tick_val
# Create a HTML representation
= colormap._repr_html_()
legend_html
# Create a customized HTML structure for the legend
= f'''
legend_html <div style="position: fixed; bottom: 50px; left: 50px; z-index: 1000; width: 400px; height: auto; background-color: rgba(255, 255, 255, 0.8);
border-radius: 5px; border: 1px solid grey; padding: 10px; font-size: 14px; color: black;">
<b>{colormap.caption}</b><br>
<div style="display: flex; justify-content: space-between;">
<div>{tick_val[0]}</div>
<div>{tick_val[1]}</div>
<div>{tick_val[2]}</div>
<div>{tick_val[3]}</div>
<div>{tick_val[4]}</div>
</div>
<div style="background: linear-gradient(to right,
{'#0000FF'}, {'#3399FF'} {20}%,
{'#3399FF'} {20}%, {'#66CCFF'} {40}%,
{'#66CCFF'} {40}%, {'#FFFFFF'} {50}%,
{'#FFFFFF'} {50}%, {'#FF66CC'} {80}%,
{'#FF66CC'} {80}%, {'#FF3399'}); height: 10px;"></div>
</div>
'''
# Display the legend and caption on the map
map_.get_root().html.add_child(folium.Element(legend_html))
# Visualize the Dual Map
map_
Calculating Zonal Statistics
To perform zonal statistics, first we need to create a polygon. In this use case we are creating a polygon along the coast of California, United States.
# Create a polygon for the area of interest (aoi)
= {
california_coast_aoi "type": "Feature", # Create a feature object
"properties": {},
"geometry": { # Set the bounding coordinates for the polygon
"coordinates": [
[-124.19, 37.86], # North-west bounding coordinate
[-123.11, 37.86], # North-east bounding coordinate
[-119.96, 33.16], # South-east bounding coordinate
[-121.13, 33.16], # South-west bounding coordinate
[-124.19, 37.86] # North-west bounding coordinate (closing the polygon)
[
]
],"type": "Polygon",
}, }
# Create a new map to display the generated polygon
= Map(
aoi_map
# Base map is set to OpenStreetMap
="OpenStreetMap",
tiles
# Define the spatial properties for the map
=[
location
# Set the center of the map
35, -120
],
# Set the zoom value
=7,
zoom_start
)
# Insert the Coastal California polygon to the map
="Coastal California").add_to(aoi_map)
folium.GeoJson(california_coast_aoi, name
# Visualize the map
aoi_map
Now that we created the polygon for the area of interest, we need to develop a function that runs through the data collection and generates the statistics for a specific item (granule) within the boundaries of the AOI polygon.
# The bounding box should be passed to the geojson param as a geojson Feature or FeatureCollection
# Create a function that retrieves information regarding a specific granule using its asset name and raster identifier and generates the statistics for it
# The function takes an item (granule) and a JSON (Coastal California polygon) as input parameters
def generate_stats(item, geojson):
# A POST request is made to submit the data associated with the item of interest (specific observation) within the Coastal California boundaries to compute its statistics
= requests.post(
result
# Raster API Endpoint for computing statistics
f"{RASTER_API_URL}/cog/statistics",
# Pass the URL to the item, asset name, and raster identifier as parameters
={"url": item["assets"][asset_name]["href"]},
params
# Send the GeoJSON object (Coastal California polygon) along with the request
=geojson,
json
# Return the response in JSON format
).json()
# Print the result
print(result)
# Return a dictionary containing the computed statistics along with the item's datetime information.
return {
**result["properties"],
"datetime": item["properties"]["start_datetime"],
}
Before we run the generated function in the previous step on a specific item (observation), we first check the total number of items available within the collection and retrieve the information regarding their start datetime.
# Check total number of items available within the collection
= requests.get(
items f"{STAC_API_URL}/collections/{collection_name}/items?limit=600"
"features"]
).json()[
# Print the total number of items (granules) found
print(f"Found {len(items)} items")
# Examine the first item in the collection
0] items[
# Generate a for loop that iterates over all the existing items in the collection
for item in items:
# The loop will then retrieve the information for the start datetime of each item in the list
print(item["properties"]["start_datetime"])
# Exit the loop after printing the start datetime for the first item in the collection
break
Generate the statistics for the AOI
%%time
# %%time = Wall time (execution time) for running the code below
# Generate statistics using the created function "generate_stats" within the bounding box defined by the "california_coast_aoi" polygon
= [generate_stats(item, california_coast_aoi) for item in items] stats
# Print the stats for the first item in the collection
0] stats[
Create a function that goes through every single item in the collection and populates their properties - including the minimum, maximum, and sum of their values - in a table.
# Create a function that converts statistics in JSON format into a pandas DataFrame
def clean_stats(stats_json) -> pd.DataFrame:
# Normalize the JSON data
= pd.json_normalize(stats_json)
df
# Replace the naming "statistics.b1" in the columns
= [col.replace("statistics.b1.", "") for col in df.columns]
df.columns
# Set the datetime format
"date"] = pd.to_datetime(df["datetime"])
df[
# Return the cleaned format
return df
# Apply the generated function on the stats data
= clean_stats(stats)
df
# Display the stats for the first 5 granules in the collection in the table
# Change the value in the parenthesis to show more or a smaller number of rows in the table
5) df.head(
Visualizing the Data as a Time Series
We can now explore the fossil fuel emission time series (January 2020 -December 2022) available for the Coastal California area of the U.S. We can plot the data set using the code below:
# Sort the DataFrame by the datetime column so the plot is displaying the values from left to right (2020 -> 2022)
= df.sort_values(by="datetime")
df_sorted
# Plot the timeseries analysis of the monthly air-sea CO₂ flux changes along the coast of California
# Figure size: 20 representing the width, 10 representing the height
= plt.figure(figsize=(20, 10))
fig
plt.plot("datetime"], # X-axis: sorted datetime
df_sorted["max"], # Y-axis: maximum CO₂ value
df_sorted[="purple", # Line color
color="-", # Line style
linestyle=1, # Line width
linewidth="CO2 Emissions", # Legend label
label
)
# Display legend
plt.legend()
# Insert label for the X-axis
"Years")
plt.xlabel(
# Insert label for the Y-axis
"CO2 Emissions mmol m²/s")
plt.ylabel(
# Insert title for the plot
"CO2 Emission Values for Coastal California (2020-2022)")
plt.title(
# Rotate x-axis labels to avoid cramping
=90)
plt.xticks(rotation
# Add data citation
plt.text("datetime"].iloc[0], # X-coordinate of the text (first datetime value)
df_sorted["max"].min(), # Y-coordinate of the text (minimum CO2 value)
df_sorted[
# Text to be displayed
"Source: NASA Air-Sea CO₂ Flux, ECCO-Darwin Model v5",
=12, # Font size
fontsize="left", # Horizontal alignment
horizontalalignment="bottom", # Vertical alignment
verticalalignment="blue", # Text color
color
)
# Plot the time series
plt.show()
Looking at the plot above, we notice that CO₂ emission level increases particularly around 2022-09-01 for the defined area of interest. To take a closer look at monthly CO₂ flux variability across this region, we are going to retrieve and display data collected during the September 2022 observation.
# The 2022-09-01 observation is the 4th item in the list.
# Considering that a list starts with "0", we need to insert "3" in the "items[3]" statement
print(items[3]["properties"]["start_datetime"])
# A GET request is made for the September 2022 tile
= requests.get(
September2022_co2_flux
# Pass the collection name, the item number in the list, and its ID
f"{RASTER_API_URL}/stac/tilejson.json?collection={items[3]['collection']}&item={items[3]['id']}"
# Pass the asset name
f"&assets={asset_name}"
# Pass the color formula and colormap for custom visualization
f"&color_formula=gamma+r+1.05&colormap_name={color_map}"
# Pass the minimum and maximum values for rescaling
f"&rescale={rescale_values['min']},{rescale_values['max']}",
# Return the response in JSON format
).json()
# Print the properties of the retrieved granule to the console
September2022_co2_flux
# Create a new map to display the September 2022 tile
= Map(
aoi_map_bbox
# Base map is set to OpenStreetMap
="OpenStreetMap",
tiles
# Set the center of the map
=[
location34, -120
],
# Set the zoom value
=5.5,
zoom_start
)
# Define the map layer with the CO2 flux data for September 2022
= TileLayer(
map_layer =September2022_co2_flux["tiles"][0], # Path to retrieve the tile
tiles="GHG", # Set the attribution
attr= 0.7, # Adjust the transparency of the layer
opacity
)
# Add the layer to the map
map_layer.add_to(aoi_map_bbox)
# Add a legend to the map
# Minimum value = -0.0007, maximum value = 0.0007
= branca.colormap.LinearColormap(colors=["#0000FF", "#3399FF", "#66CCFF", "#FFFFFF", "#FF66CC", "#FF3399", "#FF0000"], vmin=-0.0007, vmax=0.0007)
colormap
# Add the data unit as caption
= 'Millimoles per meter squared per second (mmol m²/s)'
colormap.caption
# Define custom tick values for the legend bar
= [-0.0007, -0.00035, 0, 0.00035, 0.0007]
tick_val
# Create a HTML representation
= colormap._repr_html_()
legend_html
# Create a customized HTML structure for the legend
= f'''
legend_html <div style="position: fixed; bottom: 50px; left: 50px; z-index: 1000; width: 400px; height: auto; background-color: rgba(255, 255, 255, 0.8);
border-radius: 5px; border: 1px solid grey; padding: 10px; font-size: 14px; color: black;">
<b>{colormap.caption}</b><br>
<div style="display: flex; justify-content: space-between;">
<div>{tick_val[0]}</div>
<div>{tick_val[1]}</div>
<div>{tick_val[2]}</div>
<div>{tick_val[3]}</div>
<div>{tick_val[4]}</div>
</div>
<div style="background: linear-gradient(to right,
{'#0000FF'}, {'#3399FF'} {20}%,
{'#3399FF'} {20}%, {'#66CCFF'} {40}%,
{'#66CCFF'} {40}%, {'#FFFFFF'} {50}%,
{'#FFFFFF'} {50}%, {'#FF66CC'} {80}%,
{'#FF66CC'} {80}%, {'#FF3399'}); height: 10px;"></div>
</div>
'''
# Display the legend and caption on the map
aoi_map_bbox.get_root().html.add_child(folium.Element(legend_html))
# Add the title to the map
= '''
title_html <div style="position: fixed; top: 10px; right: 10px; z-index: 1000; background-color: rgba(255, 255, 255, 0.8); border-radius: 5px; border: 1px solid grey; padding: 10px;">
<b>Air-Sea CO₂ Flux, ECCO-Darwin</b><br>
September 2022
</div>
'''
# Display the title on the map
aoi_map_bbox.get_root().html.add_child(folium.Element(title_html))
# Visualize the map
aoi_map_bbox
Summary
In this notebook we have successfully completed the following steps for the STAC collection for the NASA Air-Sea CO₂ Flux ECCO Darwin dataset: 1. Install and import the necessary libraries 2. Fetch the collection from STAC collections using the appropriate endpoints 3. Count the number of existing granules within the collection 4. Map and compare the CO₂ Flux levels over the Coastal California area for two distinctive months/years 5. Create a table that displays the minimum, maximum, and sum of the CO₂ Flux values for a specified region 6. Generate a time-series graph of the CO₂ Flux values for a specified region
If you have any questions regarding this user notebook, please contact us using the feedback form.