PACE Level 3
The Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) Level-3 products provide globally gridded derived from Level-2 swath observations. Satellite measurements are spatially and temporally aggregated (daily, 8-day, monthly, seasonal) onto regular latitude–longitude grids at two resolutions (~4 km and 0.1°). These products include ocean color variables such as chlorophyll-a, diffuse attenuation (Kd), and hyperspectral remote sensing reflectance (Rrs), along with derived biogeochemical indicators. For this notebook, we use
- PACE_OCI_L3M_Rrs
- PACE_OCI_L3M_AVW
Steps:
- Create a plan for files to use
pc.plan() - Print the plan to check it
print(plan.summary()) - Get matchups
pc.matchup(plan)
Note: In a virtual machine in AWS us-west-2, where NASA cloud data is, the point matchups are fast. In Colab, say, your comppute is not in the same data region nor provider (Google versus AWS), and the same matchups might take 10x longer. Thus if you have big matchup tasks, 10s of thousands of points, it is wise to do that in AWS us-west-2.
Prerequisites
<earthaccess.auth.Auth at 0x7f4675d2bd10>
Read in some points
import pandas as pd
time = "2025-04-09"
lat = 30.0
lon = -89.0
df = pd.DataFrame(
{
"lat": [lat],
"lon": [lon],
"time": [time],
}
)
df
| lat | lon | time | |
|---|---|---|---|
| 0 | 30.0 | -89.0 | 2025-04-09 |
Create a plan
%%time
import point_collocation as pc
plan = pc.plan(
df,
data_source="earthaccess",
source_kwargs={
"short_name": "PACE_OCI_L3M_Rrs",
"granule_name": "*.8D.*.4km.*",
}
)
CPU times: user 613 ms, sys: 83.7 ms, total: 697 ms
Wall time: 2.14 s
Plan: 1 points → 1 unique granule(s)
Points with 0 matches : 0
Points with >1 matches: 0
Time buffer: 0 days 00:00:00
First 1 point(s):
[0] lat=30.0000, lon=-89.0000, time=2025-04-09 00:00:00: 1 match(es)
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250407_20250414.L3m.8D.RRS.V3_1.Rrs.4km.nc
Look at variables in that dataset
We will open a granule and inspect.
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
Points columns used: y='lat', x='lon', time='time'
CPU times: user 1.6 s, sys: 310 ms, total: 1.91 s
Wall time: 7.17 s
<xarray.Dataset> Size: 26GB
Dimensions: (lat: 4320, lon: 8640, wavelength: 172, rgb: 3,
eightbitcolor: 256)
Coordinates:
* lat (lat) float32 17kB 89.98 89.94 89.9 ... -89.9 -89.94 -89.98
* lon (lon) float32 35kB -180.0 -179.9 -179.9 ... 179.9 179.9 180.0
* wavelength (wavelength) float64 1kB 346.0 348.0 351.0 ... 714.0 717.0 719.0
Dimensions without coordinates: rgb, eightbitcolor
Data variables:
Rrs (lat, lon, wavelength) float32 26GB dask.array<chunksize=(16, 1024, 8), meta=np.ndarray>
palette (rgb, eightbitcolor) uint8 768B dask.array<chunksize=(3, 256), meta=np.ndarray>
Attributes: (12/64)
product_name: PACE_OCI.20250407_20250414.L3m.8D.RRS....
instrument: OCI
title: OCI Level-3 Standard Mapped Image
project: Ocean Biology Processing Group (NASA/G...
platform: PACE
source: satellite observations from OCI-PACE
... ...
identifier_product_doi: 10.5067/PACE/OCI/L3M/RRS/3.1
keywords: Earth Science > Oceans > Ocean Optics ...
keywords_vocabulary: NASA Global Change Master Directory (G...
data_bins: 13688913
data_minimum: -0.009997999
data_maximum: 0.09860174Get the matchups
For variables with a 3rd dimension, like wavelength, all variables will be shown with _3rd dim value. The lat, lon, and time for the matching granules is added as a column. pc_id is the point id/row from the data you are matching. This is added in case there are multiple granules (files) per data point.|
CPU times: user 669 ms, sys: 82.5 ms, total: 751 ms
Wall time: 1.38 s
| lat | lon | time | pc_id | granule_id | granule_time | granule_lat | granule_lon | Rrs_346 | Rrs_348 | ... | Rrs_706 | Rrs_707 | Rrs_708 | Rrs_709 | Rrs_711 | Rrs_712 | Rrs_713 | Rrs_714 | Rrs_717 | Rrs_719 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 30.0 | -89.0 | 2025-04-09 | 0 | https://obdaac-tea.earthdatacloud.nasa.gov/ob-... | 2025-04-10 23:59:59+00:00 | 30.020832 | -89.020828 | 0.000306 | 0.000488 | ... | 0.003598 | 0.003496 | 0.003386 | 0.003268 | 0.003138 | 0.003004 | 0.00286 | 0.002662 | 0.002098 | 0.001644 |
1 rows × 180 columns
What if you only want some Rrs wavelengths?
You can filter the dataframe.
| lat | lon | time | Rrs_348 | Rrs_711 | |
|---|---|---|---|---|---|
| 0 | 30.0 | -89.0 | 2025-04-09 | 0.000488 | 0.003138 |
Also match on wavelength
point-collocation is designed to match lat/lon/time but you can also match other coordinates that appear in the data. depth, wavelength are common examples. For wavelength, using the filtering above probably makes mose sense, but imagine that you wanted different wavelengths for different locations. To do this, we 2 things:
- The additional coordinate as a column in our dataframe.
- A
coord_specdict that says what the extra coordinate we want to match is.
*Note. In an xarray Dataset, you will see the coordinates in the data variable information, like Rrs (lat, lon, wavelength). Inside the parentheses are the coordinates for that variable.
import pandas as pd
df = pd.DataFrame(
{
"lat": [30.0, 31.0],
"lon": [-89.0, -70.0],
"time": ["2025-04-09", "2025-04-09"],
"wave": [400,700]
}
)
df
| lat | lon | time | wave | |
|---|---|---|---|---|
| 0 | 30.0 | -89.0 | 2025-04-09 | 400 |
| 1 | 31.0 | -70.0 | 2025-04-09 | 700 |
Create our coord_spec
You will need to look at the dataset with plan.open_dataset(0) to see what the coordinates are called in the source.
# Add wavelength as something we can match
coord_spec = {
"wavelength": {"source": "wavelength", "points": "wave"}
}
Now we make a plan and pass in the coord spec
Rrs in the output is just for the wavelength in the wave column.
import point_collocation as pc
plan = pc.plan(
df,
data_source="earthaccess",
source_kwargs={
"short_name": "PACE_OCI_L3M_Rrs",
"granule_name": "*.8D.*.4km.*",
}
)
res = pc.matchup(plan, variables=["Rrs"], coord_spec=coord_spec)
res
| lat | lon | time | wave | pc_id | granule_id | granule_time | granule_lat | granule_lon | Rrs | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 30.0 | -89.0 | 2025-04-09 | 400 | 0 | https://obdaac-tea.earthdatacloud.nasa.gov/ob-... | 2025-04-10 23:59:59+00:00 | 30.020832 | -89.020828 | 0.002198 |
| 1 | 31.0 | -70.0 | 2025-04-09 | 700 | 1 | https://obdaac-tea.earthdatacloud.nasa.gov/ob-... | 2025-04-10 23:59:59+00:00 | 31.020832 | -70.020828 | 0.000276 |
Data variables that are 1D (lat, lon)
In this case, just the variable appears, no _xxx, in the returned dataframe.
%%time
import point_collocation as pc
plan = pc.plan(
df,
data_source="earthaccess",
source_kwargs={
"short_name": "PACE_OCI_L3M_AVW",
"granule_name": "*.DAY.*.4km.*",
}
)
res = pc.matchup(plan, variables=["avw"])
res
CPU times: user 522 ms, sys: 0 ns, total: 522 ms
Wall time: 8.44 s
| lat | lon | time | pc_id | granule_id | granule_time | granule_lat | granule_lon | avw | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 30.0 | -89.0 | 2025-04-09 | 0 | https://obdaac-tea.earthdatacloud.nasa.gov/ob-... | 2025-04-09 11:59:59+00:00 | 30.020832 | -89.020828 | 547.153259 |
Plan with many files
If you are not sure what files to use, you can use a short name without granule_name. Then look at the plan summary to see the file names. You just need to look at one file (n=1). In this example, there are 16 files that match. 2 resolutions (4km and 0.1 deg) and 8 temporal resolutions:
R32: rolling 32 days starting every 7 days, 4 datesSNSP: seasonal/quarterly8D: 8 dayDAY: dailyMO: monthly starting 1st day of each month to last
%%time
import point_collocation as pc
plan = pc.plan(
df,
data_source="earthaccess",
source_kwargs={
"short_name": "PACE_OCI_L3M_AVW",
}
)
CPU times: user 12.3 ms, sys: 7.95 ms, total: 20.2 ms
Wall time: 9.71 s
Plan: 1 points → 16 unique granule(s)
Points with 0 matches : 0
Points with >1 matches: 1
Time buffer: 0 days 00:00:00
First 1 point(s):
[0] lat=30.0000, lon=-89.0000, time=2025-04-09 00:00:00: 16 match(es)
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250314_20250414.L3m.R32.AVW.V3_1.avw.0p1deg.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250314_20250414.L3m.R32.AVW.V3_1.avw.4km.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250321_20250620.L3m.SNSP.AVW.V3_1.avw.0p1deg.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250321_20250620.L3m.SNSP.AVW.V3_1.avw.4km.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250322_20250422.L3m.R32.AVW.V3_1.avw.0p1deg.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250322_20250422.L3m.R32.AVW.V3_1.avw.4km.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250330_20250430.L3m.R32.AVW.V3_1.avw.0p1deg.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250330_20250430.L3m.R32.AVW.V3_1.avw.4km.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250401_20250430.L3m.MO.AVW.V3_1.avw.0p1deg.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250401_20250430.L3m.MO.AVW.V3_1.avw.4km.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250407_20250414.L3m.8D.AVW.V3_1.avw.4km.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250407_20250414.L3m.8D.AVW.V3_1.avw.0p1deg.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250407_20250508.L3m.R32.AVW.V3_1.avw.4km.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250407_20250508.L3m.R32.AVW.V3_1.avw.0p1deg.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250409.L3m.DAY.AVW.V3_1.avw.0p1deg.nc
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250409.L3m.DAY.AVW.V3_1.avw.4km.nc
Filter to the files you want
Once you see the files names, you can filter to the ones you want. using granule_name. For example *.SNSP.*.4km.* to get the seasonal (quarterly) values. * are wildcard values.
%%time
import point_collocation as pc
plan = pc.plan(
df,
data_source="earthaccess",
source_kwargs={
"short_name": "PACE_OCI_L3M_AVW",
"granule_name": "*.SNSP.*.4km.*"
}
)
CPU times: user 20 ms, sys: 0 ns, total: 20 ms
Wall time: 473 ms
Plan: 1 points → 1 unique granule(s)
Points with 0 matches : 0
Points with >1 matches: 0
Time buffer: 0 days 00:00:00
First 1 point(s):
[0] lat=30.0000, lon=-89.0000, time=2025-04-09 00:00:00: 1 match(es)
→ https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250321_20250620.L3m.SNSP.AVW.V3_1.avw.4km.nc
Try many points
import pandas as pd
url = (
"https://raw.githubusercontent.com/"
"fish-pace/point-collocation/main/"
"examples/fixtures/points.csv"
)
df_points = pd.read_csv(url)
print(len(df_points))
# Let's add on our own pc_id column
df_points = df_points.reset_index(drop=True)
df_points["pc_id"] = df_points.index + 1
df_points["pc_label"] = "pace_" + df_points["pc_id"].astype(str)
df_points.head()
595
| lat | lon | date | pc_id | pc_label | |
|---|---|---|---|---|---|
| 0 | 27.3835 | -82.7375 | 2024-06-13 | 1 | pace_1 |
| 1 | 27.1190 | -82.7125 | 2024-06-14 | 2 | pace_2 |
| 2 | 26.9435 | -82.8170 | 2024-06-14 | 3 | pace_3 |
| 3 | 26.6875 | -82.8065 | 2024-06-14 | 4 | pace_4 |
| 4 | 26.6675 | -82.6455 | 2024-06-14 | 5 | pace_5 |
Get a plan for matchups from PACE data
For this example, we will just get a plan for the first 100 points so that it runs quickly.
%%time
import point_collocation as pc
plan = pc.plan(
df_points[0:100],
data_source="earthaccess",
source_kwargs={
"short_name": "PACE_OCI_L3M_AVW",
"granule_name": "*.DAY.*.4km.*",
}
)
CPU times: user 40.8 ms, sys: 1.51 ms, total: 42.3 ms
Wall time: 580 ms
Plan: 100 points → 18 unique granule(s)
Points with 0 matches : 0
Points with >1 matches: 0
Time buffer: 0 days 00:00:00
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
<xarray.Dataset> Size: 149MB
Dimensions: (lat: 4320, lon: 8640, rgb: 3, eightbitcolor: 256)
Coordinates:
* lat (lat) float32 17kB 89.98 89.94 89.9 89.85 ... -89.9 -89.94 -89.98
* lon (lon) float32 35kB -180.0 -179.9 -179.9 ... 179.9 179.9 180.0
Dimensions without coordinates: rgb, eightbitcolor
Data variables:
avw (lat, lon) float32 149MB dask.array<chunksize=(512, 1024), meta=np.ndarray>
palette (rgb, eightbitcolor) uint8 768B dask.array<chunksize=(3, 256), meta=np.ndarray>
Attributes: (12/62)
product_name: PACE_OCI.20240613.L3m.DAY.AVW.V3_1.avw...
instrument: OCI
title: OCI Level-3 Standard Mapped Image
project: Ocean Biology Processing Group (NASA/G...
platform: PACE
source: satellite observations from OCI-PACE
... ...
cdm_data_type: grid
identifier_product_doi_authority: http://dx.doi.org
identifier_product_doi: 10.5067/PACE/OCI/L3M/AVW/3.1
data_bins: 3381968
data_minimum: 400.02658
data_maximum: 699.80536Get 100 matchups using that plan
In a virtual machine in AWS us-west-2, where NASA cloud data is, this is 12 seconds. In Colab, say, this might be over a minute since you are not in the same data region nor provider (Google versus AWS).
CPU times: user 5.3 s, sys: 155 ms, total: 5.45 s
Wall time: 12.7 s
| lat | lon | time | pc_id | pc_label | granule_id | granule_time | granule_lat | granule_lon | avw | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 27.3835 | -82.7375 | 2024-06-13 12:00:00 | 1 | pace_1 | https://obdaac-tea.earthdatacloud.nasa.gov/ob-... | 2024-06-13 11:59:59+00:00 | 27.395832 | -82.729164 | NaN |
| 1 | 27.1190 | -82.7125 | 2024-06-14 12:00:00 | 2 | pace_2 | https://obdaac-tea.earthdatacloud.nasa.gov/ob-... | 2024-06-14 11:59:59+00:00 | 27.104164 | -82.729164 | NaN |
| 2 | 26.9435 | -82.8170 | 2024-06-14 12:00:00 | 3 | pace_3 | https://obdaac-tea.earthdatacloud.nasa.gov/ob-... | 2024-06-14 11:59:59+00:00 | 26.937498 | -82.812500 | NaN |
| 3 | 26.6875 | -82.8065 | 2024-06-14 12:00:00 | 4 | pace_4 | https://obdaac-tea.earthdatacloud.nasa.gov/ob-... | 2024-06-14 11:59:59+00:00 | 26.687498 | -82.812500 | NaN |
| 4 | 26.6675 | -82.6455 | 2024-06-14 12:00:00 | 5 | pace_5 | https://obdaac-tea.earthdatacloud.nasa.gov/ob-... | 2024-06-14 11:59:59+00:00 | 26.687498 | -82.645828 | NaN |
Try lots of products
Pick a recent data point so NRT works. Not all products have files.
import pandas as pd
time = "2026-01-09"
lat = 30.0
lon = -89.0
df = pd.DataFrame(
{
"lat": [lat],
"lon": [lon],
"time": [time],
}
)
df["time"] = pd.to_datetime(df["time"])
import earthaccess
results = earthaccess.search_datasets(instrument="oci")
short_names = [
item.summary()["short-name"]
for item in results
if "L3M" in item.summary()["short-name"]
]
print(short_names)
['PACE_OCI_L3M_UVAI_UAA_NRT', 'PACE_OCI_L3M_UVAI_UAA', 'PACE_OCI_L3M_AER_UAA_NRT', 'PACE_OCI_L3M_AER_UAA', 'PACE_OCI_L3M_AVW_NRT', 'PACE_OCI_L3M_AVW', 'PACE_OCI_L3M_CHL_NRT', 'PACE_OCI_L3M_CHL', 'PACE_OCI_L3M_CLOUD_MASK_NRT', 'PACE_OCI_L3M_CLOUD_MASK', 'PACE_OCI_L3M_CLOUD_NRT', 'PACE_OCI_L3M_CLOUD', 'PACE_OCI_L3M_KD_NRT', 'PACE_OCI_L3M_KD', 'PACE_OCI_L3M_FLH_NRT', 'PACE_OCI_L3M_FLH', 'PACE_OCI_L3M_LANDVI_NRT', 'PACE_OCI_L3M_LANDVI', 'PACE_OCI_L3M_IOP_NRT', 'PACE_OCI_L3M_IOP', 'PACE_OCI_L3M_POC_NRT', 'PACE_OCI_L3M_POC', 'PACE_OCI_L3M_PAR_NRT', 'PACE_OCI_L3M_PAR', 'PACE_OCI_L3M_CARBON', 'PACE_OCI_L3M_CARBON_NRT', 'PACE_OCI_L3M_RRS_NRT', 'PACE_OCI_L3M_RRS', 'PACE_OCI_L3M_SFREFL_NRT', 'PACE_OCI_L3M_SFREFL', 'PACE_OCI_L3M_TRGAS_NRT', 'PACE_OCI_L3M_TRGAS']
%%time
# Confirm works for all L3 products
# Good. PACE_OCI_L3M_TRGAS is slow.
import point_collocation as pc
for short_name in short_names:
print(f"\n===== {short_name} =====")
try:
plan = pc.plan(
df,
data_source="earthaccess",
source_kwargs={
"short_name": short_name,
"granule_name":"*.DAY.*",
}
)
plan.open_dataset(0)
except Exception as e:
print("Failed:", e)
===== PACE_OCI_L3M_UVAI_UAA_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_UVAI_UAA =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_AER_UAA_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_AER_UAA =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_AVW_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_AVW =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_CHL_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_CHL =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_CLOUD_MASK_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_CLOUD_MASK =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_CLOUD_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_CLOUD =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_KD_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_KD =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_FLH_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_FLH =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_LANDVI_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_LANDVI =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_IOP_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_IOP =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_POC_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_POC =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_PAR_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_PAR =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_CARBON =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_CARBON_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_RRS_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_RRS =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_SFREFL_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_SFREFL =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
===== PACE_OCI_L3M_TRGAS_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.
===== PACE_OCI_L3M_TRGAS =====
CPU times: user 2.81 s, sys: 224 ms, total: 3.04 s
Wall time: 5min 28s
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[9], line 1
----> 1 get_ipython().run_cell_magic('time', '', 'import point_collocation as pc\nfor short_name in short_names:\n print(f"\\n===== {short_name} =====")\n\n try:\n plan = pc.plan(\n df,\n data_source="earthaccess",\n source_kwargs={\n "short_name": short_name,\n "granule_name":"*.DAY.*",\n }\n )\n plan.open_dataset(0)\n except Exception as e:\n print("Failed:", e)\n')
File /srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py:2572, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
2570 with self.builtin_trap:
2571 args = (magic_arg_s, cell)
-> 2572 result = fn(*args, **kwargs)
2574 # The code below prevents the output from being displayed
2575 # when using magics with decorator @output_can_be_silenced
2576 # when the last Python token in the expression is a ';'.
2577 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):
File /srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/magics/execution.py:1447, in ExecutionMagics.time(self, line, cell, local_ns)
1445 if interrupt_occured:
1446 if exit_on_interrupt and captured_exception:
-> 1447 raise captured_exception
1448 return
1449 return out
File /srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/magics/execution.py:1411, in ExecutionMagics.time(self, line, cell, local_ns)
1409 st = clock2()
1410 try:
-> 1411 exec(code, glob, local_ns)
1412 out = None
1413 # multi-line %%time case
File <timed exec>:14
File ~/point-collocation/src/point_collocation/core/plan.py:290, in Plan.open_dataset(self, result, open_method, silent)
284 # For "auto" mode, probe the file first so that the printed spec shows
285 # the actual resolved mode (e.g. "dataset" or "datatree"), not "auto".
286 # Any ValueError from _resolve_auto_spec (both probes failed) is
287 # propagated to the caller rather than silently downgrading to an
288 # empty-dataset fallback.
289 if xarray_open == "auto":
--> 290 spec = _resolve_auto_spec(file_obj, spec)
291 xarray_open = spec["xarray_open"]
292 effective_kwargs = _build_effective_open_kwargs(spec.get("open_kwargs", {}))
File ~/point-collocation/src/point_collocation/core/_open_method.py:949, in _resolve_auto_spec(file_obj, spec)
947 try:
948 with _suppress_dask_progress():
--> 949 ds_probe = xr.open_dataset(file_obj, **effective_kwargs) # type: ignore[arg-type]
950 _apply_coords(ds_probe, spec)
951 _seek_back()
File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/api.py:606, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, create_default_indexes, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
594 decoders = _resolve_decoders_kwargs(
595 decode_cf,
596 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...) 602 decode_coords=decode_coords,
603 )
605 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 606 backend_ds = backend.open_dataset(
607 filename_or_obj,
608 drop_variables=drop_variables,
609 **decoders,
610 **kwargs,
611 )
612 ds = _dataset_from_backend_dataset(
613 backend_ds,
614 filename_or_obj,
(...) 625 **kwargs,
626 )
627 return ds
File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py:540, in H5netcdfBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings, driver, driver_kwds, storage_options)
537 emit_phony_dims_warning, phony_dims = _check_phony_dims(phony_dims)
539 filename_or_obj = _normalize_filename_or_obj(filename_or_obj)
--> 540 store = H5NetCDFStore.open(
541 filename_or_obj,
542 format=format,
543 group=group,
544 lock=lock,
545 invalid_netcdf=invalid_netcdf,
546 phony_dims=phony_dims,
547 decode_vlen_strings=decode_vlen_strings,
548 driver=driver,
549 driver_kwds=driver_kwds,
550 storage_options=storage_options,
551 )
553 store_entrypoint = StoreBackendEntrypoint()
555 ds = store_entrypoint.open_dataset(
556 store,
557 mask_and_scale=mask_and_scale,
(...) 563 decode_timedelta=decode_timedelta,
564 )
File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py:240, in H5NetCDFStore.open(cls, filename, mode, format, group, lock, autoclose, invalid_netcdf, phony_dims, decode_vlen_strings, driver, driver_kwds, storage_options)
233 lock = combine_locks([HDF5_LOCK, get_write_lock(filename)])
235 manager_cls = (
236 CachingFileManager
237 if isinstance(filename, str) and not is_remote_uri(filename)
238 else PickleableFileManager
239 )
--> 240 manager = manager_cls(h5netcdf.File, filename, mode=mode, kwargs=kwargs)
242 return cls(
243 manager,
244 group=group,
(...) 248 autoclose=autoclose,
249 )
File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/file_manager.py:370, in PickleableFileManager.__init__(self, opener, mode, kwargs, *args)
368 if mode != _OMIT_MODE:
369 kwargs = kwargs | {"mode": mode}
--> 370 self._file: T_File | None = opener(*args, **kwargs)
File /srv/conda/envs/notebook/lib/python3.12/site-packages/h5netcdf/core.py:1684, in File.__init__(self, path, mode, format, invalid_netcdf, phony_dims, **kwargs)
1679 self._closed = False
1681 if self._preexisting_file:
1682 format = (
1683 "NETCDF4_CLASSIC"
-> 1684 if self._h5file.attrs.get("_nc3_strict")
1685 else "NETCDF4"
1686 )
1688 self._filename = self._h5file.filename
1689 self._mode = mode
File /srv/conda/envs/notebook/lib/python3.12/site-packages/h5py/_hl/files.py:293, in File.attrs(self)
291 from . import attrs
292 with phil:
--> 293 return attrs.AttributeManager(self['/'])
File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()
File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()
File /srv/conda/envs/notebook/lib/python3.12/site-packages/h5py/_hl/group.py:360, in Group.__getitem__(self, name)
358 raise ValueError("Invalid HDF5 object reference")
359 elif isinstance(name, (bytes, str)):
--> 360 oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
361 else:
362 raise TypeError("Accessing a group is done with bytes or str, "
363 "not {}".format(type(name)))
File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()
File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()
File h5py/h5o.pyx:257, in h5py.h5o.open()
File h5py/h5fd.pyx:162, in h5py.h5fd.H5FD_fileobj_read()
File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/spec.py:2140, in AbstractBufferedFile.readinto(self, b)
2135 """mirrors builtin file's readinto method
2136
2137 https://docs.python.org/3/library/io.html#io.RawIOBase.readinto
2138 """
2139 out = memoryview(b).cast("B")
-> 2140 data = self.read(out.nbytes)
2141 out[: len(data)] = data
2142 return len(data)
File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/spec.py:2122, in AbstractBufferedFile.read(self, length)
2119 if length == 0:
2120 # don't even bother calling fetch
2121 return b""
-> 2122 out = self.cache._fetch(self.loc, self.loc + length)
2124 logger.debug(
2125 "%s read: %i - %i %s",
2126 self,
(...) 2129 self.cache._log_stats(),
2130 )
2131 self.loc += len(out)
File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:891, in BackgroundBlockCache._fetch(self, start, end)
889 # these are cached, so safe to do multiple calls for the same start and end.
890 for block_number in range(start_block_number, end_block_number + 1):
--> 891 self._fetch_block_cached(block_number)
893 # fetch next block in the background if nothing is running in the background,
894 # the block is within file and it is not already cached
895 end_block_plus_1 = end_block_number + 1
File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:739, in UpdatableLRU.__call__(self, *args, **kwargs)
736 self._hits += 1
737 return self._cache[args]
--> 739 result = self._func(*args, **kwargs)
741 with self._lock:
742 self._cache[args] = result
File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:929, in BackgroundBlockCache._fetch_block(self, block_number, log_info)
927 self.total_requested_bytes += end - start
928 self.miss_count += 1
--> 929 block_contents = super()._fetch(start, end)
930 return block_contents
File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:69, in BaseCache._fetch(self, start, stop)
67 if start >= self.size or start >= stop:
68 return b""
---> 69 return self.fetcher(start, stop)
File /srv/conda/envs/notebook/lib/python3.12/site-packages/s3fs/core.py:2416, in S3File._fetch_range(self, start, end)
2414 def _fetch_range(self, start, end):
2415 try:
-> 2416 return _fetch_range(
2417 self.fs,
2418 self.bucket,
2419 self.key,
2420 self.version_id,
2421 start,
2422 end,
2423 req_kw=self.req_kw,
2424 )
2426 except OSError as ex:
2427 if ex.args[0] == errno.EINVAL and "pre-conditions" in ex.args[1]:
File /srv/conda/envs/notebook/lib/python3.12/site-packages/s3fs/core.py:2585, in _fetch_range(fs, bucket, key, version_id, start, end, req_kw)
2583 return b""
2584 logger.debug("Fetch: %s/%s, %s-%s", bucket, key, start, end)
-> 2585 return sync(fs.loop, _inner_fetch, fs, bucket, key, version_id, start, end, req_kw)
File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/asyn.py:91, in sync(loop, func, timeout, *args, **kwargs)
88 asyncio.run_coroutine_threadsafe(_runner(event, coro, result, timeout), loop)
89 while True:
90 # this loops allows thread to get interrupted
---> 91 if event.wait(1):
92 break
93 if timeout is not None:
File /srv/conda/envs/notebook/lib/python3.12/threading.py:655, in Event.wait(self, timeout)
653 signaled = self._flag
654 if not signaled:
--> 655 signaled = self._cond.wait(timeout)
656 return signaled
File /srv/conda/envs/notebook/lib/python3.12/threading.py:359, in Condition.wait(self, timeout)
357 else:
358 if timeout > 0:
--> 359 gotit = waiter.acquire(True, timeout)
360 else:
361 gotit = waiter.acquire(False)
KeyboardInterrupt: