# For searching NASA data
import earthaccess
# For reading data, analysis and plotting
import xarray as xr
import hvplot.xarray
import pprint
Accessing and working with ICESat-2 data in the cloud

1. Tutorial Overview
Note: This is an updated version of the notebook that was presented to the NSIDC DAAC User Working Group in May 2022
This notebook demonstrates searching for cloud-hosted ICESat-2 data and directly accessing Land Ice Height (ATL06) granules from an Amazon Compute Cloud (EC2) instance using the earthaccess
package. NASA data “in the cloud” are stored in Amazon Web Services (AWS) Simple Storage Service (S3) Buckets. Direct Access is an efficient way to work with data stored in an S3 Bucket when you are working in the cloud. Cloud-hosted granules can be opened and loaded into memory without the need to download them first. This allows you take advantage of the scalability and power of cloud computing.
The Amazon Global cloud is divided into geographical regions. To have direct access to data stored in a region, our compute instance - a virtual computer that we create to perform processing operations in place of using our own desktop or laptop - must be in the same region as the data. This is a fundamental concept of analysis in place. NASA cloud-hosted data is in Amazon Region us-west2. So your compute instance must also be in us-west2. If we wanted to use data stored in another region, to use direct access for that data, we would start a compute instance in that region.
As an example data collection, we use ICESat-2 Land Ice Height (ATL06) over the Juneau Icefield, AK, for March 2003. ICESat-2 data granules, including ATL06, are stored in HDF5 format. We demonstrate how to open an HDF5 granule and access data variables using xarray
. Land Ice Heights are then plotted using hvplot
.
earthaccess
is a package developed by Luis Lopez (NSIDC developer) to allow easy search of the NASA Common Metadata Repository (CMR) and download of NASA data collections. It can be used for programmatic search and access for both DAAC-hosted and cloud-hosted data. It manages authenticating using Earthdata Login credentials which are then used to obtain the S3 tokens that are needed for S3 direct access. https://github.com/nsidc/earthaccess
Credits
The notebook was created by Andy Barrett, NSIDC, updated by Jennifer Roebuck, NSIDC, and is based on notebooks developed by Luis Lopez and Mikala Beig, NSIDC.
For questions regarding the notebook, or to report problems, please create a new issue in the NSIDC-Data-Tutorials repo.
Learning Objectives
By the end of this demonstration you will be able to:
1. use earthaccess
to search for ICESat-2 data using spatial and temporal filters and explore search results;
2. open data granules using direct access to the ICESat-2 S3 bucket;
3. load a HDF5 group into an xarray.Dataset
;
4. visualize the land ice heights using hvplot
.
Prerequisites
- An EC2 instance in the us-west-2 region. NASA cloud-hosted data is in Amazon Region us-west2. So you also need an EC2 instance in the us-west-2 region. An EC2 instance is a virtual computer that you create to perform processing operations in place of using your own desktop or laptop. Details on how to set up an instance can be found here.
- An Earthdata Login is required for data access. If you don’t have one, you can register for one here.
- A .netrc file, that contains your Earthdata Login credentials, in your home directory. The current recommended practice for authentication is to create a .netrc file in your home directory following these instructions (Step 1) and to use the .netrc file for authentication when required for data access during the tutorial.
- The nsidc-tutorials environment is setup and activated. This README has setup instructions.
Example of end product
At the end of this tutorial, the following figure will be generated:
### Time requirement
Allow approximately 20 minutes to complete this tutorial.
2. Tutorial steps
Import Packages
The first step in any python
script or notebook is to import packages. This tutorial requires the following packages: - earthaccess
, which enables Earthdata Login authentication and retrieves AWS credentials; enables collection and granule searches; and S3 access; - xarray
, used to load data; - hvplot
, used to visualize land ice height data.
We are going to import the whole earthaccess
package.
We will also import the whole xarray
package but use a standard short name xr
, using the import <package> as <short_name>
syntax. We could use anything for a short name but xr
is an accepted standard that most xarray
users are familiar with.
We only need the xarray
module from hvplot
so we import that using the import <package>.<module>
syntax.
Authenticate
The first step is to get the correct authentication that will allow us to get cloud-hosted ICESat-2 data. This is all done through Earthdata Login. The login
method also gets the correct AWS credentials.
Login requires your Earthdata Login username and password. The login
method will automatically search for these credentials as environment variables or in a .netrc
file, and if those aren’t available it will prompt us to enter our username and password. We use a .netrc
strategy. A .netrc
file is a text file located in our home directory that contains login information for remote machines. If we don’t have a .netrc
file, login
can create one for us.
earthaccess.login(strategy='interactive', persist=True)
= earthaccess.login() auth
Search for ICESat-2 Collections
earthaccess
leverages the Common Metadata Repository (CMR) API to search for collections and granules. Earthdata Search also uses the CMR API.
We can use the search_datasets
method to search for ICESat-2 collections by setting keyword='ICESat-2'
.
This will display the number of data collections (data sets) that meet this search criteria.
= earthaccess.search_datasets(keyword = 'ICESat-2') Query
In this case there are 65 collections that have the keyword ICESat-2.
The search_datasets
method returns a python list of DataCollection
objects. We can view the metadata for each collection in long form by passing a DataCollection
object to print or as a summary using the summary
method. We can also use the pprint
function to Pretty Print each object.
We will do this for the first 10 results (objects).
for collection in Query[:10]:
=True, indent=4)
pprint.pprint(collection.summary(), sort_dictsprint('')
For each collection, summary
returns a subset of fields from the collection metadata and the Unified Metadata Model (UMM): - concept-id
is a unique id for the collection. It consists of an alphanumeric code and the provider-id specific to the DAAC (Distributed Active Archive Center). You can use the concept_id
to search for data granules. - short_name
is a quick way of referring to a collection (instead of using the full title). It can be found on the collection landing page underneath the collection title after ‘DATA SET ID’. See the table below for a list of the shortnames for ICESat-2 collections. - version
is the version of each collection. - file-type
gives information about the file format of the collection granules. - get-data
is a collection of URLs that can be used to access the data, collection landing pages and data tools. - cloud-info
this is for cloud-hosted data and provides additional information about the location of the S3 bucket that holds the data and where to get temporary AWS S3 credentials to access the S3 buckets. earthaccess
handles these credentials and the links to the S3 buckets, so in general you won’t need to worry about this information.
For the ICESat-2 search results, within the concept-id, there is a provider-id; NSIDC_ECS
and NSIDC_CPRD
. NSIDC_ECS
which is for the on-prem collections and NSIDC_CPRD
is for the cloud-hosted collections.
For ICESat-2, ShortNames
are generally how different products are referred to.
ShortName | Product Description |
---|---|
ATL03 | ATLAS/ICESat-2 L2A Global Geolocated Photon Data |
ATL06 | ATLAS/ICESat-2 L3A Land Ice Height |
ATL07 | ATLAS/ICESat-2 L3A Sea Ice Height |
ATL08 | ATLAS/ICESat-2 L3A Land and Vegetation Height |
ATL09 | ATLAS/ICESat-2 L3A Calibrated Backscatter Profiles and Atmospheric Layer Characteristics |
ATL10 | ATLAS/ICESat-2 L3A Sea Ice Freeboard |
ATL11 | ATLAS/ICESat-2 L3B Slope-Corrected Land Ice Height Time Series |
ATL12 | ATLAS/ICESat-2 L3A Ocean Surface Height |
ATL13 | ATLAS/ICESat-2 L3A Along Track Inland Surface Water Data |
Search for cloud-hosted data
For most collections, to search for only data in the cloud, the cloud_hosted
method can be used.
= earthaccess.search_datasets(
Query = 'ICESat-2',
keyword = True
cloud_hosted )
Search a data set using spatial and temporal filters
We can use the search_data
method to search for granules within a data set by location and time using spatial and temporal filters. In this example, we will search for data granules from the ATL06 verison 006 cloud-hosted data set over the Juneau Icefield, AK, for March and April 2020.
The temporal range is identified with standard date strings, and latitude-longitude corners of a bounding box is specified. Polygons and points, as well as shapefiles can also be specified.
This will display the number of granules that match our search.
= earthaccess.search_data(
results = 'ATL06',
short_name = '006',
version = True,
cloud_hosted = (-134.7,58.9,-133.9,59.2),
bounding_box = ('2020-03-01','2020-04-30'),
temporal )
To display the rendered metadata, including the download link, granule size and two images, we will use display
. In the example below, all 4 results are shown.
The download link is https
and can be used download the granule to your local machine. This is similar to downloading DAAC-hosted data but in this case the data are coming from the Earthdata Cloud. For NASA data in the Earthdata Cloud, there is no charge to the user for egress from AWS Cloud servers. This is not the case for other data in the cloud.
Note the [None, None, None, None]
that is displayed at the end can be ignored, it has no meaning in relation to the metadata.
for r in results] [display(r)
Use Direct-Access to open, load and display data stored on S3
Direct-access to data from an S3 bucket is a two step process. First, the files are opened using the open
method. The auth
object created at the start of the notebook is used to provide Earthdata Login authentication and AWS credentials.
The next step is to load the data. In this case, data are loaded into an xarray.Dataset
. Data could be read into numpy
arrays or a pandas.Dataframe
. However, each granule would have to be read using a package that reads HDF5 granules such as h5py
. xarray
does this all under-the-hood in a single line but for a single group in the HDF5 granule*.
*ICESat-2 measures photon returns from 3 beam pairs numbered 1, 2 and 3 that each consist of a left and a right beam. In this case, we are interested in the left ground track (gt) of beam pair 1.
= earthaccess.open(results)
files = xr.open_dataset(files[1], group='/gt1l/land_ice_segments') ds
ds
hvplot
is an interactive plotting tool that is useful for exploring data.
'h_li'].hvplot(kind='scatter', s=2) ds[
3. Learning outcomes recap
We have learned how to: 1. use earthaccess
to search for ICESat-2 data using spatial and temporal filters and explore search results; 2. open data granules using direct access to the ICESat-2 S3 bucket; 3. load a HDF5 group into an xarray.Dataset; 4. visualize the land ice heights using hvplot.
4. Additional resources
For general information about NSIDC DAAC data in the Earthdata Cloud:
FAQs About NSIDC DAAC’s Earthdata Cloud Migration
NASA Earthdata Cloud Data Access Guide
Additional tutorials and How Tos: