2024-04-19
Data Strategies enhance collaboration and reproducible science.
Good to start from the beginning of a project, great to start from where you are now.
You.
Your team.
The scientific community.
http://r4ds.hadley.nz/
Cloud services are infrastructure, platforms and software hosted in the cloud and made available to users via an API, often accessed via a web interface.
NASA’s Harmony (https://harmony.earthdata.nasa.gov/) can subset, reproject and reformat, and serve data.
This might save processing steps.
import earthaccess
auth = earthaccess.login(strategy='netrc')
Query = earthaccess.granule_query().concept_id(
'C2153572614-NSIDC_CPRD'
).temporal(
"2020-03-01", "2020-03-30"
).bounding_box(
-134.7,58.9,-133.9,59.2)
granules = Query.get(4)
files = earthaccess.open(granules)
ds = xr.open_dataset(files[1], group='/gt1l/land_ice_segments')
ds
# Start to do awesome science
FAIR
Applies to the future you and your team as well.
Does everyone on your team know where the data is?
Can they access it?
Helpful to document this somewhere.
Keep raw data, raw!
Save intermediate data not just final versions.
Use consistent and descriptive folder and file name patterns.
(base) nsidc-442-abarrett:data_strategies_for_a_future_us$ tree Data
Data
├── calibrated
├── cleaned
├── figures
├── final
├── monthly_averages
├── raw
└── results
7 directories, 0 files
Avoid Excel and other proprietary formats.
Metadata standards and conventions ensure that standard tools can read/interpret the data.
Standards also define the meaning of metadata attributes.
Document each step.
Can you (or anyone else) easily reproduce your processing pipeline?
With GUI interfaces - e.g. ArcGIS, QGIS, Excel - use screenshots, journal commands.