Searching for collections and granules
earthaccess Python library
earthaccess is a Python library to search for and download or stream NASA Earth science data with just a few lines of code.
“earthaccess revolutionizes NASA data access by drastically reducing the complexity and code required. Since open science is a collaborative effort involving people from different technical backgrounds, our team took the approach that data analysis can and should be made more inclusive and accessible by reducing the complexities of underlying systems.”
Luis López, NSIDC softare developer and earthaccess creator
More complete documentation for using earthaccess to search for data can be found here.
Quickstart Guide with instructions for installing earthaccess.
Search for collections (i.e. data sets)
A basic search with earthaccess.search_datasets
earthaccess.search_datasets allows you to search for NASA collections using keywords. As a simple example, we’ll search for data sets with the keyword “icesat-2” associated with them.
If you haven’t already, import the library. We also use the pprint library in this example, so import it as well.
import earthaccess
import pprintdatasets = earthaccess.search_datasets(
keyword="icesat-2"
)search_datasets queries NASA’s Common Metadata Repository (CMR) and returns a list of collection objects associated with that keyword. We can find the number of collections or CMR “hits” using len().
print(len(datasets))77
Other parameters accepted by earthaccess.search_datasets can be found here.
Collection objects have a summary method associated with them that we can use to explore certain metadata for the collection. We can use the built-in Python pprint module to view the summary metadata.
pprint.pprint(datasets[0].summary())){'cloud-info': {'Region': 'us-west-2',
'S3BucketAndObjectPrefixNames': ['nsidc-cumulus-prod-protected/ATLAS/ATL08/007',
'nsidc-cumulus-prod-public/ATLAS/ATL08/007'],
'S3CredentialsAPIDocumentationURL': 'https://data.nsidc.earthdatacloud.nasa.gov/s3credentialsREADME',
'S3CredentialsAPIEndpoint': 'https://data.nsidc.earthdatacloud.nasa.gov/s3credentials'},
'concept-id': 'C3565574177-NSIDC_CPRD',
'file-type': "[{'FormatType': 'Native', 'Format': 'HDF5', "
"'FormatDescription': 'HTTPS'}]",
'get-data': ['https://search.earthdata.nasa.gov/search/granules?p=C3565574177-NSIDC_CPRD',
'https://openaltimetry.earthdatacloud.nasa.gov/data/',
'https://nsidc.org/data/data-access-tool/ATL08/versions/7/',
'https://cmr.earthdata.nasa.gov/virtual-directory/collections/C3565574177-NSIDC_CPRD',
'https://earthaccess.readthedocs.io/en/stable/'],
'short-name': 'ATL08',
'version': '007'}
For each collection, summary returns a subset of fields from the collection metadata and the Unified Metadata Model (UMM): - concept-id is a unique id for the collection. It consists of an alphanumeric code and the provider-id specific to the DAAC (Distributed Active Archive Center). - short_name is a quick way of referring to a collection (instead of using the full title). It can be found on the collection landing page underneath the collection title after ‘DATA SET ID’. See the table below for a list of the shortnames for ICESat-2 collections. - version is the version of each collection. - file-type gives information about the file format of the collection granules. - get-data is a collection of URLs that can be used to access the data, collection landing pages and data tools. - cloud-info this is for cloud-hosted data and provides additional information about the location of the S3 bucket that holds the data and where to get temporary AWS S3 credentials to access the S3 buckets. earthaccess handles these credentials and the links to the S3 buckets, so in general you won’t need to worry about this information.
For the ICESat-2 search results, within the concept-id, there is a provider-id NSIDC_CPRD. NSIDC_CPRD is for the cloud-hosted collections.
For ICESat-2, ShortNames are generally how different products are referred to.
| ShortName | Product Description |
|---|---|
| ATL03 | ATLAS/ICESat-2 L2A Global Geolocated Photon Data |
| ATL06 | ATLAS/ICESat-2 L3A Land Ice Height |
| ATL07 | ATLAS/ICESat-2 L3A Sea Ice Height |
| ATL08 | ATLAS/ICESat-2 L3A Land and Vegetation Height |
| ATL09 | ATLAS/ICESat-2 L3A Calibrated Backscatter Profiles and Atmospheric Layer Characteristics |
| ATL10 | ATLAS/ICESat-2 L3A Sea Ice Freeboard |
| ATL11 | ATLAS/ICESat-2 L3B Slope-Corrected Land Ice Height Time Series |
| ATL12 | ATLAS/ICESat-2 L3A Ocean Surface Height |
| ATL13 | ATLAS/ICESat-2 L3A Along Track Inland Surface Water Data |
Search for granules (e.g. files) within a collection using filter parameters
A simple example with earthaccess.search_data
earthaccess.search_data allows you to search for specific granules (i.e. files or bundles of files) within a specified collection using filters.
files = earthaccess.search_data(
short_name = 'ATL06',
version = '006',
bounding_box = (-134.7,58.9,-133.9,59.2),
temporal = ('2020-03-01','2020-04-30'),
)search_data queries NASA’s Common Metadata Repository (CMR) and returns a list of granule objects from the data set we specified using short_name and version that fall within the spatial and temporal bounds we provided. We can find the number of granules or CMR “hits” returned for those filter paramaters using len().
print(len(files))4
Other parameters accepted by earthaccess.search_datasets can be found here and additional usage examples can be found here.
We can get some basic information about the files:
pprint.pprint(files[0].size)<bound method DataGranule.size of Collection: {'EntryTitle': 'ATLAS/ICESat-2 L3A Land Ice Height V006'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -134.3399, 'Latitude': 59.03152}, {'Longitude': -134.44371, 'Latitude': 59.03709}, {'Longitude': -134.75456, 'Latitude': 57.4161}, {'Longitude': -134.6551, 'Latitude': 57.41076}, {'Longitude': -134.3399, 'Latitude': 59.03152}]}}]}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2020-03-10T12:15:10.646Z', 'EndingDateTime': '2020-03-10T12:15:58.724Z'}}
Size(MB): 3.034775733947754
Data: ['https://data.nsidc.earthdatacloud.nasa.gov/nsidc-cumulus-prod-protected/ATLAS/ATL06/006/2020/03/10/ATL06_20200310121504_11420606_006_01.h5']>