earthaccess a NASA Earthdata API Client 🌍 in Python

Overview

TL;DR: earthaccess is uses NASA APIs to search, preview and access NASA datasets on-prem and in the cloud with 4 lines of Python.

There are many ways to access NASA datasets, we can use the NASA’s Earthdata search portal. We can use DAAC specific websites or tools. We could even use data.gov! These web portals are great but… they are not designed for programmatic access and reproducible workflows. This is extremely important in the age of the cloud and reproducible open science. In this context, earthaccess aims to be a simple library that can deal with the important parts of the metadata so we can access or download data without having to worry if a given dataset is on-prem or in the cloud.

The core function of auth is to deal with cloud credentials and remote file sessions (fsspec or requests). essentially, anything that requires you to log in to Earthdata. Most of this will happen behind-the-scenes for you once you have been authenticated.

NASA EDL and the Auth class

  • Step 1. We need to open an account with NASA Eardtada, this credentials will allow us to access NASA datasets.

Once we have our account we can use it with earthaccess, as we are using features that are not merged yet we’ll install it from source this time.

%pip uninstall -yq earthaccess
%pip install -q git+https://github.com/nsidc/earthaccess.git@explore
import logging
logging.basicConfig(level=logging.INFO,
                    force = True)

try:
    import earthaccess
    import xarray as xr
    from pyproj import Geod
    import numpy as np
    import hvplot.xarray
    from matplotlib import pyplot as plt
    from pprint import pprint
    import panel as pn
    import panel.widgets as pnw
    from pqdm.threads import pqdm
except ImportError as e:
    logging.warning("installing missing dependencies... ")
    %pip install -qq matplotlib hvplot pyproj xarray numpy h5netcdf panel pqdm
finally:
    import earthaccess
    import xarray as xr
    from pyproj import Geod
    import numpy as np
    import hvplot.xarray
    from matplotlib import pyplot as plt
    from pprint import pprint
    import panel.widgets as pnw
    from pqdm.threads import pqdm
    logging.info("Dependencies imported")
auth = earthaccess.login()
earthaccess.__version__

Searching for data using a region of interest

import warnings
warnings.filterwarnings('ignore')

path = "bosque_primavera.json"
# path = "bosque_primavera.kml" 
# path = "bosque_primavera.shp"
geom = earthaccess.load_geometry(path)

Search and Access with earthaccess

earthaccess uses NASA’s search API to search for data from the different Distributed Archive Centers, the data can be hosted by the DAACs or in AWS, with earthaccess we don’t need to think about this because it will handle the authentication for us. For reproducible workflows we just need to use the dataset (or collection as NASA calls them) concept_id.

The concept_id of a collection can be found with earthaccess or using NASA Earthdata search portal.

results = earthaccess.search_data(
    concept_id = ["C2613553260-NSIDC_CPRD", "C2237824918-ORNL_CLOUD", "C1908348134-LPDAAC_ECS", "C2021957657-LPCLOUD", "C2631841556-LPCLOUD"],
    temporal = ("2013", "2023"),
    # unpacking the dict
    **geom
)

Interactive metadata visualization with explore()

m = earthaccess.explore(results, roi=geom)
m

Accessing the data with .download() and .open()

Option 1. I’m not in AWS

%%time

results = earthaccess.search_data(
    concept_id = ["C2021957657-LPCLOUD"],
    temporal = ("2013", "2023"),
    # unpacking the dict
    **geom
)

Option 2. I’m in AWS us-west-2 ☁️

Analysis in place with S3 direct access

Same API, just a different origin

%%time

results = earthaccess.search_data(
    concept_id = ["C2021957657-LPCLOUD"],
    temporal = ("2013", "2023"),
    # unpacking the dict
    **geom
)
%%time
files = earthaccess.open(results[0:4])
import rioxarray

ds = rioxarray.open_rasterio(files[0])
ds
geometries = [
    {
        'type': 'Polygon',
        'coordinates':[geom["polygon"]]
    }
]
clipped = ds.rio.clip(geometries, drop=True, crs=4326)
clipped
clipped.plot()

clipped.hvplot(x="x", y="y", crs=xds.rio.estimate_utm_crs()) * map

Next Steps: Subsetting in the Cloud

After looking at the spatial coverage of some of the data we’ve been working with there is a clear need to perform a data reduction

%%time
# accessing the data on prem means downloading it if we are in a local environment or "uploading them" if we are in the cloud.
order = earthaccess.subset(results, roi=polygon)