PACE Level 3

The Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) Level-3 products provide globally gridded derived from Level-2 swath observations. Satellite measurements are spatially and temporally aggregated (daily, 8-day, monthly, seasonal) onto regular latitude–longitude grids at two resolutions (~4 km and 0.1°). These products include ocean color variables such as chlorophyll-a, diffuse attenuation (Kd), and hyperspectral remote sensing reflectance (Rrs), along with derived biogeochemical indicators. For this notebook, we use

PACE_OCI_L3M_Rrs
PACE_OCI_L3M_AVW

Steps:

Create a plan for files to use pc.plan()
Print the plan to check it print(plan.summary())
Get matchups pc.matchup(plan)

Note: In a virtual machine in AWS us-west-2, where NASA cloud data is, the point matchups are fast. In Colab, say, your comppute is not in the same data region nor provider (Google versus AWS), and the same matchups might take 10x longer. Thus if you have big matchup tasks, 10s of thousands of points, it is wise to do that in AWS us-west-2.

Prerequisites

# install if needed
!pip install point-collocation --quiet

# Make sure you are logged in
import earthaccess
earthaccess.login()

<earthaccess.auth.Auth at 0x7f4675d2bd10>

Read in some points

import pandas as pd
time = "2025-04-09"
lat = 30.0
lon = -89.0

df = pd.DataFrame(
    {
        "lat": [lat],
        "lon": [lon],
        "time": [time],
    }
)
df

	lat	lon	time
0	30.0	-89.0	2025-04-09

Create a plan

%%time
import point_collocation as pc
plan = pc.plan(
    df,
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L3M_Rrs",
        "granule_name": "*.8D.*.4km.*",
    }
)

CPU times: user 613 ms, sys: 83.7 ms, total: 697 ms
Wall time: 2.14 s

plan.summary()

Plan: 1 points → 1 unique granule(s)
  Points with 0 matches : 0
  Points with >1 matches: 0
  Time buffer: 0 days 00:00:00

First 1 point(s):
  [0] lat=30.0000, lon=-89.0000, time=2025-04-09 00:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250407_20250414.L3m.8D.RRS.V3_1.Rrs.4km.nc

Look at variables in that dataset

We will open a granule and inspect.

%%time
plan.open_dataset(0)

open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)
Points columns used: y='lat', x='lon', time='time'
CPU times: user 1.6 s, sys: 310 ms, total: 1.91 s
Wall time: 7.17 s

<xarray.Dataset> Size: 26GB
Dimensions:     (lat: 4320, lon: 8640, wavelength: 172, rgb: 3,
                 eightbitcolor: 256)
Coordinates:
  * lat         (lat) float32 17kB 89.98 89.94 89.9 ... -89.9 -89.94 -89.98
  * lon         (lon) float32 35kB -180.0 -179.9 -179.9 ... 179.9 179.9 180.0
  * wavelength  (wavelength) float64 1kB 346.0 348.0 351.0 ... 714.0 717.0 719.0
Dimensions without coordinates: rgb, eightbitcolor
Data variables:
    Rrs         (lat, lon, wavelength) float32 26GB dask.array<chunksize=(16, 1024, 8), meta=np.ndarray>
    palette     (rgb, eightbitcolor) uint8 768B dask.array<chunksize=(3, 256), meta=np.ndarray>
Attributes: (12/64)
    product_name:                      PACE_OCI.20250407_20250414.L3m.8D.RRS....
    instrument:                        OCI
    title:                             OCI Level-3 Standard Mapped Image
    project:                           Ocean Biology Processing Group (NASA/G...
    platform:                          PACE
    source:                            satellite observations from OCI-PACE
    ...                                ...
    identifier_product_doi:            10.5067/PACE/OCI/L3M/RRS/3.1
    keywords:                          Earth Science > Oceans > Ocean Optics ...
    keywords_vocabulary:               NASA Global Change Master Directory (G...
    data_bins:                         13688913
    data_minimum:                      -0.009997999
    data_maximum:                      0.09860174

Get the matchups

For variables with a 3rd dimension, like wavelength, all variables will be shown with _3rd dim value. The lat, lon, and time for the matching granules is added as a column. pc_id is the point id/row from the data you are matching. This is added in case there are multiple granules (files) per data point.|

%%time
res = pc.matchup(plan, variables=["Rrs"])
res

CPU times: user 669 ms, sys: 82.5 ms, total: 751 ms
Wall time: 1.38 s

	lat	lon	time	pc_id	granule_id	granule_time	granule_lat	granule_lon	Rrs_346	Rrs_348	...	Rrs_706	Rrs_707	Rrs_708	Rrs_709	Rrs_711	Rrs_712	Rrs_713	Rrs_714	Rrs_717	Rrs_719
0	30.0	-89.0	2025-04-09	0	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2025-04-10 23:59:59+00:00	30.020832	-89.020828	0.000306	0.000488	...	0.003598	0.003496	0.003386	0.003268	0.003138	0.003004	0.00286	0.002662	0.002098	0.001644

1 rows × 180 columns

What if you only want some Rrs wavelengths?

You can filter the dataframe.

res[['lat', 'lon', 'time', 'Rrs_348', 'Rrs_711']]

	lat	lon	time	Rrs_348	Rrs_711
0	30.0	-89.0	2025-04-09	0.000488	0.003138

Also match on wavelength

point-collocation is designed to match lat/lon/time but you can also match other coordinates that appear in the data. depth, wavelength are common examples. For wavelength, using the filtering above probably makes mose sense, but imagine that you wanted different wavelengths for different locations. To do this, we 2 things:

The additional coordinate as a column in our dataframe.
A coord_spec dict that says what the extra coordinate we want to match is.

*Note. In an xarray Dataset, you will see the coordinates in the data variable information, like Rrs (lat, lon, wavelength). Inside the parentheses are the coordinates for that variable.

import pandas as pd
df = pd.DataFrame(
    {
        "lat": [30.0, 31.0],
        "lon": [-89.0, -70.0],
        "time": ["2025-04-09", "2025-04-09"],
        "wave": [400,700]
    }
)
df

	lat	lon	time	wave
0	30.0	-89.0	2025-04-09	400
1	31.0	-70.0	2025-04-09	700

Create our `coord_spec`

You will need to look at the dataset with plan.open_dataset(0) to see what the coordinates are called in the source.

# Add wavelength as something we can match
coord_spec = {
    "wavelength": {"source": "wavelength", "points": "wave"}
}

Now we make a plan and pass in the coord spec

Rrs in the output is just for the wavelength in the wave column.

import point_collocation as pc
plan = pc.plan(
    df,
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L3M_Rrs",
        "granule_name": "*.8D.*.4km.*",
    }
)
res = pc.matchup(plan, variables=["Rrs"], coord_spec=coord_spec)
res

	lat	lon	time	wave	pc_id	granule_id	granule_time	granule_lat	granule_lon	Rrs
0	30.0	-89.0	2025-04-09	400	0	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2025-04-10 23:59:59+00:00	30.020832	-89.020828	0.002198
1	31.0	-70.0	2025-04-09	700	1	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2025-04-10 23:59:59+00:00	31.020832	-70.020828	0.000276

Data variables that are 1D (lat, lon)

In this case, just the variable appears, no _xxx, in the returned dataframe.

%%time
import point_collocation as pc
plan = pc.plan(
    df,
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L3M_AVW",
        "granule_name": "*.DAY.*.4km.*",
    }
)
res = pc.matchup(plan, variables=["avw"])
res

CPU times: user 522 ms, sys: 0 ns, total: 522 ms
Wall time: 8.44 s

	lat	lon	time	pc_id	granule_id	granule_time	granule_lat	granule_lon	avw
0	30.0	-89.0	2025-04-09	0	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2025-04-09 11:59:59+00:00	30.020832	-89.020828	547.153259

Plan with many files

If you are not sure what files to use, you can use a short name without granule_name. Then look at the plan summary to see the file names. You just need to look at one file (n=1). In this example, there are 16 files that match. 2 resolutions (4km and 0.1 deg) and 8 temporal resolutions:

R32: rolling 32 days starting every 7 days, 4 dates
SNSP: seasonal/quarterly
8D: 8 day
DAY: daily
MO: monthly starting 1st day of each month to last

%%time
import point_collocation as pc
plan = pc.plan(
    df,
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L3M_AVW",
    }
)

CPU times: user 12.3 ms, sys: 7.95 ms, total: 20.2 ms
Wall time: 9.71 s

plan.summary(n=1)

Plan: 1 points → 16 unique granule(s)
  Points with 0 matches : 0
  Points with >1 matches: 1
  Time buffer: 0 days 00:00:00

First 1 point(s):
  [0] lat=30.0000, lon=-89.0000, time=2025-04-09 00:00:00: 16 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250314_20250414.L3m.R32.AVW.V3_1.avw.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250314_20250414.L3m.R32.AVW.V3_1.avw.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250321_20250620.L3m.SNSP.AVW.V3_1.avw.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250321_20250620.L3m.SNSP.AVW.V3_1.avw.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250322_20250422.L3m.R32.AVW.V3_1.avw.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250322_20250422.L3m.R32.AVW.V3_1.avw.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250330_20250430.L3m.R32.AVW.V3_1.avw.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250330_20250430.L3m.R32.AVW.V3_1.avw.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250401_20250430.L3m.MO.AVW.V3_1.avw.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250401_20250430.L3m.MO.AVW.V3_1.avw.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250407_20250414.L3m.8D.AVW.V3_1.avw.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250407_20250414.L3m.8D.AVW.V3_1.avw.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250407_20250508.L3m.R32.AVW.V3_1.avw.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250407_20250508.L3m.R32.AVW.V3_1.avw.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250409.L3m.DAY.AVW.V3_1.avw.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250409.L3m.DAY.AVW.V3_1.avw.4km.nc

Filter to the files you want

Once you see the files names, you can filter to the ones you want. using granule_name. For example *.SNSP.*.4km.* to get the seasonal (quarterly) values. * are wildcard values.

%%time
import point_collocation as pc
plan = pc.plan(
    df,
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L3M_AVW",
        "granule_name": "*.SNSP.*.4km.*"
    }
)

CPU times: user 20 ms, sys: 0 ns, total: 20 ms
Wall time: 473 ms

plan.summary()

Plan: 1 points → 1 unique granule(s)
  Points with 0 matches : 0
  Points with >1 matches: 0
  Time buffer: 0 days 00:00:00

First 1 point(s):
  [0] lat=30.0000, lon=-89.0000, time=2025-04-09 00:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20250321_20250620.L3m.SNSP.AVW.V3_1.avw.4km.nc

Try many points

import pandas as pd
url = (
    "https://raw.githubusercontent.com/"
    "fish-pace/point-collocation/main/"
    "examples/fixtures/points.csv"
)
df_points = pd.read_csv(url)
print(len(df_points))

# Let's add on our own pc_id column
df_points = df_points.reset_index(drop=True)
df_points["pc_id"] = df_points.index + 1
df_points["pc_label"] = "pace_" + df_points["pc_id"].astype(str)

df_points.head()

	lat	lon	date	pc_id	pc_label
0	27.3835	-82.7375	2024-06-13	1	pace_1
1	27.1190	-82.7125	2024-06-14	2	pace_2
2	26.9435	-82.8170	2024-06-14	3	pace_3
3	26.6875	-82.8065	2024-06-14	4	pace_4
4	26.6675	-82.6455	2024-06-14	5	pace_5

Get a plan for matchups from PACE data

For this example, we will just get a plan for the first 100 points so that it runs quickly.

%%time
import point_collocation as pc
plan = pc.plan(
    df_points[0:100],
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L3M_AVW",
        "granule_name": "*.DAY.*.4km.*",
    }
)

CPU times: user 40.8 ms, sys: 1.51 ms, total: 42.3 ms
Wall time: 580 ms

plan.summary(n=0)

Plan: 100 points → 18 unique granule(s)
  Points with 0 matches : 0
  Points with >1 matches: 0
  Time buffer: 0 days 00:00:00

plan.open_dataset(0)

open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

<xarray.Dataset> Size: 149MB
Dimensions:  (lat: 4320, lon: 8640, rgb: 3, eightbitcolor: 256)
Coordinates:
  * lat      (lat) float32 17kB 89.98 89.94 89.9 89.85 ... -89.9 -89.94 -89.98
  * lon      (lon) float32 35kB -180.0 -179.9 -179.9 ... 179.9 179.9 180.0
Dimensions without coordinates: rgb, eightbitcolor
Data variables:
    avw      (lat, lon) float32 149MB dask.array<chunksize=(512, 1024), meta=np.ndarray>
    palette  (rgb, eightbitcolor) uint8 768B dask.array<chunksize=(3, 256), meta=np.ndarray>
Attributes: (12/62)
    product_name:                      PACE_OCI.20240613.L3m.DAY.AVW.V3_1.avw...
    instrument:                        OCI
    title:                             OCI Level-3 Standard Mapped Image
    project:                           Ocean Biology Processing Group (NASA/G...
    platform:                          PACE
    source:                            satellite observations from OCI-PACE
    ...                                ...
    cdm_data_type:                     grid
    identifier_product_doi_authority:  http://dx.doi.org
    identifier_product_doi:            10.5067/PACE/OCI/L3M/AVW/3.1
    data_bins:                         3381968
    data_minimum:                      400.02658
    data_maximum:                      699.80536

Get 100 matchups using that plan

In a virtual machine in AWS us-west-2, where NASA cloud data is, this is 12 seconds. In Colab, say, this might be over a minute since you are not in the same data region nor provider (Google versus AWS).

%%time
res = pc.matchup(plan, variables = ["avw"])

CPU times: user 5.3 s, sys: 155 ms, total: 5.45 s
Wall time: 12.7 s

res.head()

	lat	lon	time	pc_id	pc_label	granule_id	granule_time	granule_lat	granule_lon	avw
0	27.3835	-82.7375	2024-06-13 12:00:00	1	pace_1	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2024-06-13 11:59:59+00:00	27.395832	-82.729164	NaN
1	27.1190	-82.7125	2024-06-14 12:00:00	2	pace_2	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2024-06-14 11:59:59+00:00	27.104164	-82.729164	NaN
2	26.9435	-82.8170	2024-06-14 12:00:00	3	pace_3	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2024-06-14 11:59:59+00:00	26.937498	-82.812500	NaN
3	26.6875	-82.8065	2024-06-14 12:00:00	4	pace_4	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2024-06-14 11:59:59+00:00	26.687498	-82.812500	NaN
4	26.6675	-82.6455	2024-06-14 12:00:00	5	pace_5	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2024-06-14 11:59:59+00:00	26.687498	-82.645828	NaN

Try lots of products

Pick a recent data point so NRT works. Not all products have files.

import pandas as pd
time = "2026-01-09"
lat = 30.0
lon = -89.0

df = pd.DataFrame(
    {
        "lat": [lat],
        "lon": [lon],
        "time": [time],
    }
)
df["time"] = pd.to_datetime(df["time"])

import earthaccess
results = earthaccess.search_datasets(instrument="oci")

short_names = [
    item.summary()["short-name"]
    for item in results
    if "L3M" in item.summary()["short-name"]
]

print(short_names)

['PACE_OCI_L3M_UVAI_UAA_NRT', 'PACE_OCI_L3M_UVAI_UAA', 'PACE_OCI_L3M_AER_UAA_NRT', 'PACE_OCI_L3M_AER_UAA', 'PACE_OCI_L3M_AVW_NRT', 'PACE_OCI_L3M_AVW', 'PACE_OCI_L3M_CHL_NRT', 'PACE_OCI_L3M_CHL', 'PACE_OCI_L3M_CLOUD_MASK_NRT', 'PACE_OCI_L3M_CLOUD_MASK', 'PACE_OCI_L3M_CLOUD_NRT', 'PACE_OCI_L3M_CLOUD', 'PACE_OCI_L3M_KD_NRT', 'PACE_OCI_L3M_KD', 'PACE_OCI_L3M_FLH_NRT', 'PACE_OCI_L3M_FLH', 'PACE_OCI_L3M_LANDVI_NRT', 'PACE_OCI_L3M_LANDVI', 'PACE_OCI_L3M_IOP_NRT', 'PACE_OCI_L3M_IOP', 'PACE_OCI_L3M_POC_NRT', 'PACE_OCI_L3M_POC', 'PACE_OCI_L3M_PAR_NRT', 'PACE_OCI_L3M_PAR', 'PACE_OCI_L3M_CARBON', 'PACE_OCI_L3M_CARBON_NRT', 'PACE_OCI_L3M_RRS_NRT', 'PACE_OCI_L3M_RRS', 'PACE_OCI_L3M_SFREFL_NRT', 'PACE_OCI_L3M_SFREFL', 'PACE_OCI_L3M_TRGAS_NRT', 'PACE_OCI_L3M_TRGAS']

%%time
# Confirm works for all L3 products
# Good. PACE_OCI_L3M_TRGAS is slow.
import point_collocation as pc
for short_name in short_names:
    print(f"\n===== {short_name} =====")

    try:
        plan = pc.plan(
            df,
            data_source="earthaccess",
            source_kwargs={
                "short_name": short_name,
                "granule_name":"*.DAY.*",
             }
        )
        plan.open_dataset(0)
    except Exception as e:
        print("Failed:", e)

===== PACE_OCI_L3M_UVAI_UAA_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_UVAI_UAA =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_AER_UAA_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_AER_UAA =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_AVW_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_AVW =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_CHL_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_CHL =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_CLOUD_MASK_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_CLOUD_MASK =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_CLOUD_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_CLOUD =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_KD_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_KD =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_FLH_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_FLH =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_LANDVI_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_LANDVI =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_IOP_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_IOP =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_POC_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_POC =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_PAR_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_PAR =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_CARBON =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_CARBON_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_RRS_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_RRS =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_SFREFL_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_SFREFL =====
open_method: {'xarray_open': 'dataset', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': None}
Geolocation auto detected with cf_xarray: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

===== PACE_OCI_L3M_TRGAS_NRT =====
Failed: result index 0 is out of range for a plan with 0 result(s). Valid indices are 0 to -1.

===== PACE_OCI_L3M_TRGAS =====
CPU times: user 2.81 s, sys: 224 ms, total: 3.04 s
Wall time: 5min 28s



---------------------------------------------------------------------------

KeyboardInterrupt                         Traceback (most recent call last)

Cell In[9], line 1
----> 1 get_ipython().run_cell_magic('time', '', 'import point_collocation as pc\nfor short_name in short_names:\n    print(f"\\n===== {short_name} =====")\n\n    try:\n        plan = pc.plan(\n            df,\n            data_source="earthaccess",\n            source_kwargs={\n                "short_name": short_name,\n                "granule_name":"*.DAY.*",\n             }\n        )\n        plan.open_dataset(0)\n    except Exception as e:\n        print("Failed:", e)\n')


File /srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py:2572, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2570 with self.builtin_trap:
   2571     args = (magic_arg_s, cell)
-> 2572     result = fn(*args, **kwargs)
   2574 # The code below prevents the output from being displayed
   2575 # when using magics with decorator @output_can_be_silenced
   2576 # when the last Python token in the expression is a ';'.
   2577 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):


File /srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/magics/execution.py:1447, in ExecutionMagics.time(self, line, cell, local_ns)
   1445 if interrupt_occured:
   1446     if exit_on_interrupt and captured_exception:
-> 1447         raise captured_exception
   1448     return
   1449 return out


File /srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/magics/execution.py:1411, in ExecutionMagics.time(self, line, cell, local_ns)
   1409 st = clock2()
   1410 try:
-> 1411     exec(code, glob, local_ns)
   1412     out = None
   1413     # multi-line %%time case


File <timed exec>:14


File ~/point-collocation/src/point_collocation/core/plan.py:290, in Plan.open_dataset(self, result, open_method, silent)
    284 # For "auto" mode, probe the file first so that the printed spec shows
    285 # the actual resolved mode (e.g. "dataset" or "datatree"), not "auto".
    286 # Any ValueError from _resolve_auto_spec (both probes failed) is
    287 # propagated to the caller rather than silently downgrading to an
    288 # empty-dataset fallback.
    289 if xarray_open == "auto":
--> 290     spec = _resolve_auto_spec(file_obj, spec)
    291     xarray_open = spec["xarray_open"]
    292     effective_kwargs = _build_effective_open_kwargs(spec.get("open_kwargs", {}))


File ~/point-collocation/src/point_collocation/core/_open_method.py:949, in _resolve_auto_spec(file_obj, spec)
    947 try:
    948     with _suppress_dask_progress():
--> 949         ds_probe = xr.open_dataset(file_obj, **effective_kwargs)  # type: ignore[arg-type]
    950     _apply_coords(ds_probe, spec)
    951     _seek_back()


File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/api.py:606, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, create_default_indexes, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    594 decoders = _resolve_decoders_kwargs(
    595     decode_cf,
    596     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)    602     decode_coords=decode_coords,
    603 )
    605 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 606 backend_ds = backend.open_dataset(
    607     filename_or_obj,
    608     drop_variables=drop_variables,
    609     **decoders,
    610     **kwargs,
    611 )
    612 ds = _dataset_from_backend_dataset(
    613     backend_ds,
    614     filename_or_obj,
   (...)    625     **kwargs,
    626 )
    627 return ds


File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py:540, in H5netcdfBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings, driver, driver_kwds, storage_options)
    537 emit_phony_dims_warning, phony_dims = _check_phony_dims(phony_dims)
    539 filename_or_obj = _normalize_filename_or_obj(filename_or_obj)
--> 540 store = H5NetCDFStore.open(
    541     filename_or_obj,
    542     format=format,
    543     group=group,
    544     lock=lock,
    545     invalid_netcdf=invalid_netcdf,
    546     phony_dims=phony_dims,
    547     decode_vlen_strings=decode_vlen_strings,
    548     driver=driver,
    549     driver_kwds=driver_kwds,
    550     storage_options=storage_options,
    551 )
    553 store_entrypoint = StoreBackendEntrypoint()
    555 ds = store_entrypoint.open_dataset(
    556     store,
    557     mask_and_scale=mask_and_scale,
   (...)    563     decode_timedelta=decode_timedelta,
    564 )


File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py:240, in H5NetCDFStore.open(cls, filename, mode, format, group, lock, autoclose, invalid_netcdf, phony_dims, decode_vlen_strings, driver, driver_kwds, storage_options)
    233         lock = combine_locks([HDF5_LOCK, get_write_lock(filename)])
    235 manager_cls = (
    236     CachingFileManager
    237     if isinstance(filename, str) and not is_remote_uri(filename)
    238     else PickleableFileManager
    239 )
--> 240 manager = manager_cls(h5netcdf.File, filename, mode=mode, kwargs=kwargs)
    242 return cls(
    243     manager,
    244     group=group,
   (...)    248     autoclose=autoclose,
    249 )


File /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/file_manager.py:370, in PickleableFileManager.__init__(self, opener, mode, kwargs, *args)
    368 if mode != _OMIT_MODE:
    369     kwargs = kwargs | {"mode": mode}
--> 370 self._file: T_File | None = opener(*args, **kwargs)


File /srv/conda/envs/notebook/lib/python3.12/site-packages/h5netcdf/core.py:1684, in File.__init__(self, path, mode, format, invalid_netcdf, phony_dims, **kwargs)
   1679     self._closed = False
   1681 if self._preexisting_file:
   1682     format = (
   1683         "NETCDF4_CLASSIC"
-> 1684         if self._h5file.attrs.get("_nc3_strict")
   1685         else "NETCDF4"
   1686     )
   1688 self._filename = self._h5file.filename
   1689 self._mode = mode


File /srv/conda/envs/notebook/lib/python3.12/site-packages/h5py/_hl/files.py:293, in File.attrs(self)
    291 from . import attrs
    292 with phil:
--> 293     return attrs.AttributeManager(self['/'])


File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()


File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()


File /srv/conda/envs/notebook/lib/python3.12/site-packages/h5py/_hl/group.py:360, in Group.__getitem__(self, name)
    358         raise ValueError("Invalid HDF5 object reference")
    359 elif isinstance(name, (bytes, str)):
--> 360     oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
    361 else:
    362     raise TypeError("Accessing a group is done with bytes or str, "
    363                     "not {}".format(type(name)))


File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()


File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()


File h5py/h5o.pyx:257, in h5py.h5o.open()


File h5py/h5fd.pyx:162, in h5py.h5fd.H5FD_fileobj_read()


File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/spec.py:2140, in AbstractBufferedFile.readinto(self, b)
   2135 """mirrors builtin file's readinto method
   2136 
   2137 https://docs.python.org/3/library/io.html#io.RawIOBase.readinto
   2138 """
   2139 out = memoryview(b).cast("B")
-> 2140 data = self.read(out.nbytes)
   2141 out[: len(data)] = data
   2142 return len(data)


File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/spec.py:2122, in AbstractBufferedFile.read(self, length)
   2119 if length == 0:
   2120     # don't even bother calling fetch
   2121     return b""
-> 2122 out = self.cache._fetch(self.loc, self.loc + length)
   2124 logger.debug(
   2125     "%s read: %i - %i %s",
   2126     self,
   (...)   2129     self.cache._log_stats(),
   2130 )
   2131 self.loc += len(out)


File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:891, in BackgroundBlockCache._fetch(self, start, end)
    889 # these are cached, so safe to do multiple calls for the same start and end.
    890 for block_number in range(start_block_number, end_block_number + 1):
--> 891     self._fetch_block_cached(block_number)
    893 # fetch next block in the background if nothing is running in the background,
    894 # the block is within file and it is not already cached
    895 end_block_plus_1 = end_block_number + 1


File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:739, in UpdatableLRU.__call__(self, *args, **kwargs)
    736         self._hits += 1
    737         return self._cache[args]
--> 739 result = self._func(*args, **kwargs)
    741 with self._lock:
    742     self._cache[args] = result


File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:929, in BackgroundBlockCache._fetch_block(self, block_number, log_info)
    927 self.total_requested_bytes += end - start
    928 self.miss_count += 1
--> 929 block_contents = super()._fetch(start, end)
    930 return block_contents


File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:69, in BaseCache._fetch(self, start, stop)
     67 if start >= self.size or start >= stop:
     68     return b""
---> 69 return self.fetcher(start, stop)


File /srv/conda/envs/notebook/lib/python3.12/site-packages/s3fs/core.py:2416, in S3File._fetch_range(self, start, end)
   2414 def _fetch_range(self, start, end):
   2415     try:
-> 2416         return _fetch_range(
   2417             self.fs,
   2418             self.bucket,
   2419             self.key,
   2420             self.version_id,
   2421             start,
   2422             end,
   2423             req_kw=self.req_kw,
   2424         )
   2426     except OSError as ex:
   2427         if ex.args[0] == errno.EINVAL and "pre-conditions" in ex.args[1]:


File /srv/conda/envs/notebook/lib/python3.12/site-packages/s3fs/core.py:2585, in _fetch_range(fs, bucket, key, version_id, start, end, req_kw)
   2583     return b""
   2584 logger.debug("Fetch: %s/%s, %s-%s", bucket, key, start, end)
-> 2585 return sync(fs.loop, _inner_fetch, fs, bucket, key, version_id, start, end, req_kw)


File /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/asyn.py:91, in sync(loop, func, timeout, *args, **kwargs)
     88 asyncio.run_coroutine_threadsafe(_runner(event, coro, result, timeout), loop)
     89 while True:
     90     # this loops allows thread to get interrupted
---> 91     if event.wait(1):
     92         break
     93     if timeout is not None:


File /srv/conda/envs/notebook/lib/python3.12/threading.py:655, in Event.wait(self, timeout)
    653 signaled = self._flag
    654 if not signaled:
--> 655     signaled = self._cond.wait(timeout)
    656 return signaled


File /srv/conda/envs/notebook/lib/python3.12/threading.py:359, in Condition.wait(self, timeout)
    357 else:
    358     if timeout > 0:
--> 359         gotit = waiter.acquire(True, timeout)
    360     else:
    361         gotit = waiter.acquire(False)


KeyboardInterrupt: