Collections & Granules

NASA Earth science data can feel overwhelming when you’re new to it… there are thousands of datasets, millions of files, and many tools for accessing them. A first step in combatting the overhwhelm is better understanding how NASA organizes, and talks about, their data. Below we define two terms; collection and granule. These are, for the most part, synonymous with a dataset and a file respectively.


What Are Collections and Granules?

📘 Collections

A collection is NASA’s version of a “dataset family.”
Every collection groups together many related files that share common features, such as:

  • The same satellite or instrument
  • The same science product type
  • The same processing level
  • A consistent file structure

A collection is the group that defines what the dataset is.

Examples:

  • SMAP L4 Global 3-hourly 9 km EASE-Grid Surface and Root Zone Soil Moisture Geophysical Data V008
  • ATLAS/ICESat-2 L3A Land and Vegetation Height V007

Collection metadata includes at least the following:

  • Product description
  • Spatial & temporal coverage
  • Data provider
  • Variables
  • Version

📄 Granules

A granule is the smallest discrete data unit in a collection. Depending on the data format, a granule may be a single file or a set of files that together represent one logical item (for example, a shapefile’s .shp, .shx, .dbf components).

What each granule represents can vary from one science mission (and data product) to the next. For example, one granule could represent:

  • One SMAP L2 soil moisture half-orbit pass,
  • One AMSR2 daily polar gridded brightness temperature file, or
  • One MEaSUREs Greenland monthly ice sheet velocity magnitude mosaic

Granules also have their own metadata, such as:

  • Timestamps
  • A spatial bounding box
  • File size
  • Variables included, etc.

Granules are what you actually download and analyze. It is common to need MANY granules for a specific research objective!


NASA’s Metadata Backbone: CMR

NASA stores all metadata (for collections and granules, among other things) in a system called the Common Metadata Repository (CMR). CMR serves as NASA’s metadata database and search index, and provides an API that clients can use to access and interact with this metadata.

CMR powers:

  • Earthdata Search
  • DAAC search portals
  • APIs for programmatic discovery
  • Cloud-native data access workflows

CMR metadata is stored using the Unified Metadata Model (UMM):

  • UMM-C → Collection metadata
  • UMM-G → Granule metadata
  • UMM-Var → Variables
  • UMM-Svc → Services
  • UMM-T → Tools

Using the UMM allows CMR to host its metadata records in several supported native formats, with translation services available between formats.

Simply, CMR is a high-performance, continuously evolving metadata system that catalogs all metadata for NASA’s Earth Science Data and Information System Project (ESDIS) program. You can learn more about it here.

Collection & Granule Names

NASA uses structured naming so files can be indexed, searched, and recognized by automated workflows.

📘 Collection Names

Each collection has a unique CMR identifying Concept-ID (for example, the SMAP record below’s Concept-ID is C3480440870-NSIDC_CPRD) that makes it possible to reference a collection efficiently across tools. In addition, a collection will also have a:

1. Long Name (Human-Readable)

Example:
SMAP L4 Global 3-hourly 9 km EASE-Grid Surface and Root Zone Soil Moisture Geophysical Data V008

2. Short Name (Machine-Friendly)

Example:
SPL4SMGP

3. Version

Example:
008

Where the combination of short name + version uniquely identifies a dataset within CMR.

📄 Granule Names

Granule names can get pretty lengthy and will include some important information about the data collection they are contained within, as well as specifics about that granule in particular. For example, most products will have a naming “formula” that includes the following:

  • Product short name
  • Date and optional time
  • Version
  • Spatial identifier (if applicable)
  • File type
Note

Not all naming structures are the same across data products! The User Guide (linked on the data set landing page, see NSIDC-0803 for example) provides the exact naming convention for each data product.

Some granules cover a global or regional extent, in which case no explicit spatial identifier appears in the filename (many gridded polar products are examples of this; see Example 1 below).

Other products divide their data into smaller spatial segments, and the granule name will include an identifier to indicate which portion of the Earth the file represents (see Example 2).

Example 1: AMSR2 Daily Polar Gridded Sea Ice Concentrations, Version 2

Where one granule is:

NSIDC0803_SEAICE_AMSR2_S_20240105_v2.0.nc

This product does not include a spatial identifier because each granule covers a full polar region.

Naming Breakdown
Component Example Meaning
Short name NSIDC0803 AMSR2 Daily Polar Gridded Sea Ice Concentrations Product
Information SEAICE_AMSR2_S Variable, satellite, and hemisphere
Date/time 20240105 YYYYMMDD format
Version v2.0 Collection version
File type .nc NetCDF format

Example 2: MODIS/Terra Snow Cover Daily L3 Global 500m SIN Grid, Version 61

Where one granule is:

MOD10A1.A2003001.h13v03.061.2019130062218.hdf

This product does include a spatial identifier because each granule covers only a portion of the global grid.

Naming Breakdown
Component Example Meaning
Short name MOD10A1 MODIS/Terra Snow Cover Daily L3 Global 500m SIN Grid Product
Date/time acquired (A) A2003001 AYYYYDDDD format
Spatial identifier h13v03 Horizontal and vertical tile number
Version 061 Collection version
Production date 2019130062218 yyyydddhhmmss format in GMT
File type .hdf HDF-eos format