%pip uninstall -yq earthaccess
%pip install -q git+https://github.com/nsidc/earthaccess.git@explore
earthaccess a NASA Earthdata API Client 🌍 in Python
Overview
TL;DR: earthaccess is uses NASA APIs to search, preview and access NASA datasets on-prem and in the cloud with 4 lines of Python.
There are many ways to access NASA datasets, we can use the NASA’s Earthdata search portal. We can use DAAC specific websites or tools. We could even use data.gov! These web portals are great but… they are not designed for programmatic access and reproducible workflows. This is extremely important in the age of the cloud and reproducible open science. In this context, earthaccess aims to be a simple library that can deal with the important parts of the metadata so we can access or download data without having to worry if a given dataset is on-prem or in the cloud.
The core function of auth is to deal with cloud credentials and remote file sessions (fsspec or requests). essentially, anything that requires you to log in to Earthdata. Most of this will happen behind-the-scenes for you once you have been authenticated.
NASA EDL and the Auth class
- Step 1. We need to open an account with NASA Eardtada, this credentials will allow us to access NASA datasets.
Once we have our account we can use it with earthaccess, as we are using features that are not merged yet we’ll install it from source this time.
import logging
=logging.INFO,
logging.basicConfig(level= True)
force
try:
import earthaccess
import xarray as xr
from pyproj import Geod
import numpy as np
import hvplot.xarray
from matplotlib import pyplot as plt
from pprint import pprint
import panel as pn
import panel.widgets as pnw
from pqdm.threads import pqdm
except ImportError as e:
"installing missing dependencies... ")
logging.warning(%pip install -qq matplotlib hvplot pyproj xarray numpy h5netcdf panel pqdm
finally:
import earthaccess
import xarray as xr
from pyproj import Geod
import numpy as np
import hvplot.xarray
from matplotlib import pyplot as plt
from pprint import pprint
import panel.widgets as pnw
from pqdm.threads import pqdm
"Dependencies imported")
logging.info(
= earthaccess.login()
auth earthaccess.__version__
Searching for data using a region of interest
import warnings
'ignore')
warnings.filterwarnings(
= "bosque_primavera.json"
path # path = "bosque_primavera.kml"
# path = "bosque_primavera.shp"
= earthaccess.load_geometry(path) geom
Search and Access with earthaccess
earthaccess uses NASA’s search API to search for data from the different Distributed Archive Centers, the data can be hosted by the DAACs or in AWS, with earthaccess we don’t need to think about this because it will handle the authentication for us. For reproducible workflows we just need to use the dataset (or collection as NASA calls them) concept_id
.
The concept_id
of a collection can be found with earthaccess or using NASA Earthdata search portal.
= earthaccess.search_data(
results = ["C2613553260-NSIDC_CPRD", "C2237824918-ORNL_CLOUD", "C1908348134-LPDAAC_ECS", "C2021957657-LPCLOUD", "C2631841556-LPCLOUD"],
concept_id = ("2013", "2023"),
temporal # unpacking the dict
**geom
)
Interactive metadata visualization with explore()
= earthaccess.explore(results, roi=geom)
m m
Accessing the data with .download()
and .open()
Option 1. I’m not in AWS
%%time
= earthaccess.search_data(
results = ["C2021957657-LPCLOUD"],
concept_id = ("2013", "2023"),
temporal # unpacking the dict
**geom
)
Option 2. I’m in AWS us-west-2
☁️
Analysis in place with S3 direct access
Same API, just a different origin
%%time
= earthaccess.search_data(
results = ["C2021957657-LPCLOUD"],
concept_id = ("2013", "2023"),
temporal # unpacking the dict
**geom
)
%%time
= earthaccess.open(results[0:4]) files
import rioxarray
= rioxarray.open_rasterio(files[0])
ds ds
= [
geometries
{'type': 'Polygon',
'coordinates':[geom["polygon"]]
}
]= ds.rio.clip(geometries, drop=True, crs=4326)
clipped clipped
clipped.plot()
="x", y="y", crs=xds.rio.estimate_utm_crs()) * map clipped.hvplot(x
Next Steps: Subsetting in the Cloud
After looking at the spatial coverage of some of the data we’ve been working with there is a clear need to perform a data reduction
%%time
# accessing the data on prem means downloading it if we are in a local environment or "uploading them" if we are in the cloud.
= earthaccess.subset(results, roi=polygon) order