PACE Level 2

The Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) Level-2 products provide geolocated (lat/lon) swath data. Each granule contains high-resolution measurements (∼1 km) organized in a two-dimensional swath (scan line × pixel), rather than a regular grid. These data include hyperspectral remote sensing reflectance (Rrs) across hundreds of wavelengths, along with derived products such as chlorophyll-a, inherent optical properties, and atmospheric parameters. We use the PACE_OCI_L2_AOP product for this demonstration.

Steps:

Create a plan for files to use pc.plan()
Print the plan to check it print(plan.summary())
Do the plan and get matchups pc.matchup(plan)

Note: In a virtual machine in AWS us-west-2, where NASA cloud data is, the point matchups are fast. In Colab, say, your compute is not in the same data region nor provider, and the same matchups might take 10x longer.

Prerequisites

The examples here use NASA EarthData and you need to have an account with EarthData. Make sure you can login.

# if needed
!pip install point-collocation

import earthaccess
earthaccess.login()

<earthaccess.auth.Auth at 0x7fd752432d20>

Here are the level 2 datasets

import earthaccess
results = earthaccess.search_datasets(instrument="oci")

short_names = [
    item.summary()["short-name"]
    for item in results
    if "L2" in item.summary()["short-name"]
]

print(short_names)

['PACE_OCI_L2_UVAI_UAA_NRT', 'PACE_OCI_L2_UVAI_UAA', 'PACE_OCI_L2_AER_UAA_NRT', 'PACE_OCI_L2_AER_UAA', 'PACE_OCI_L2_AOP_NRT', 'PACE_OCI_L2_AOP', 'PACE_OCI_L2_CLOUD_MASK_NRT', 'PACE_OCI_L2_CLOUD_MASK', 'PACE_OCI_L2_CLOUD_NRT', 'PACE_OCI_L2_CLOUD', 'PACE_OCI_L2_LANDVI_NRT', 'PACE_OCI_L2_LANDVI', 'PACE_OCI_L2_BGC_NRT', 'PACE_OCI_L2_BGC', 'PACE_OCI_L2_IOP_NRT', 'PACE_OCI_L2_IOP', 'PACE_OCI_L2_PAR_NRT', 'PACE_OCI_L2_PAR', 'PACE_OCI_L2_SFREFL_NRT', 'PACE_OCI_L2_SFREFL', 'PACE_OCI_L2_TRGAS_NRT', 'PACE_OCI_L2_TRGAS']

Load some points

import pandas as pd
url = (
    "https://raw.githubusercontent.com/"
    "fish-pace/point-collocation/main/"
    "examples/fixtures/points.csv"
)
df_points = pd.read_csv(url)
print(len(df_points))
df_points.head()

	lat	lon	date
0	27.3835	-82.7375	2024-06-13
1	27.1190	-82.7125	2024-06-14
2	26.9435	-82.8170	2024-06-14
3	26.6875	-82.8065	2024-06-14
4	26.6675	-82.6455	2024-06-14

Get a plan for matchups for 1st 50 points from PACE data

A time buffer is needed since the swath granules are short time windows. We set time_buffer="12h" to find granules that are within 12 hours of our point times.

%%time
import point_collocation as pc
plan = pc.plan(
    df_points[0:50],    
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L2_AOP",
    },
    time_buffer="12h"
)

CPU times: user 21.1 ms, sys: 8.6 ms, total: 29.7 ms
Wall time: 510 ms

plan.summary()

Plan: 50 points → 13 unique granule(s)
  Points with 0 matches : 0
  Points with >1 matches: 10
  Time buffer: 0 days 12:00:00

First 5 point(s):
  [0] lat=27.3835, lon=-82.7375, time=2024-06-13 12:00:00: 2 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240613T171620.L2.OC_AOP.V3_1.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240613T184939.L2.OC_AOP.V3_1.nc
  [1] lat=27.1190, lon=-82.7125, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240614T175104.L2.OC_AOP.V3_1.nc
  [2] lat=26.9435, lon=-82.8170, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240614T175104.L2.OC_AOP.V3_1.nc
  [3] lat=26.6875, lon=-82.8065, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240614T175104.L2.OC_AOP.V3_1.nc
  [4] lat=26.6675, lon=-82.6455, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240614T175104.L2.OC_AOP.V3_1.nc

Look at the granule to see groups

We will use plan.open_dataset(0) to open the first granule and take a look. This uses open_method="auto". It will try xr.open_dataset, discover no lat/lon and then try xr.open_datatree. You will need to specify what groups to merge. Looking at the datatree, we see want these two groups: /geophysical_data and /navigation_data. We will create a open method dictionary.

%%time
plan.open_dataset(0, open_method="datatree")

open_method: {'xarray_open': 'datatree', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'merge': None, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None}
CPU times: user 276 ms, sys: 61.8 ms, total: 338 ms
Wall time: 853 ms

<xarray.DataTree>
Group: /
│   Attributes: (12/47)
│       title:                             OCI Level-2 Data AOP
│       product_name:                      PACE_OCI.20240613T171120.L2.OC_AOP.V3_...
│       processing_version:                3.1
│       history:                           l2gen par=/data18/sdpsoper/vdc/vpu37/w...
│       instrument:                        OCI
│       platform:                          PACE
│       ...                                ...
│       geospatial_lon_min:                -82.05421
│       startDirection:                    Ascending
│       endDirection:                      Ascending
│       day_night_flag:                    Day
│       earth_sun_distance_correction:     0.9694168567657471
│       geospatial_bounds:                 POLYGON ((-55.18162 31.32767, -82.0542...
├── Group: /sensor_band_parameters
│       Dimensions:        (number_of_bands: 286, number_of_reflective_bands: 286,
│                           wavelength_3d: 172)
│       Coordinates:
│         * wavelength_3d  (wavelength_3d) float64 1kB 346.0 348.0 351.0 ... 717.0 719.0
│       Dimensions without coordinates: number_of_bands, number_of_reflective_bands
│       Data variables:
│           wavelength     (number_of_bands) float64 2kB dask.array<chunksize=(32,), meta=np.ndarray>
│           vcal_gain      (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           vcal_offset    (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           F0             (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           aw             (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           bbw            (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           k_oz           (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           k_no2          (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           Tau_r          (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
├── Group: /scan_line_attributes
│       Dimensions:  (number_of_lines: 1710)
│       Dimensions without coordinates: number_of_lines
│       Data variables: (12/13)
│           year     (number_of_lines) float64 14kB dask.array<chunksize=(32,), meta=np.ndarray>
│           day      (number_of_lines) float64 14kB dask.array<chunksize=(32,), meta=np.ndarray>
│           msec     (number_of_lines) float64 14kB dask.array<chunksize=(32,), meta=np.ndarray>
│           time     (number_of_lines) datetime64[ns] 14kB dask.array<chunksize=(32,), meta=np.ndarray>
│           detnum   (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           mside    (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           ...       ...
│           clon     (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           elon     (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           slat     (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           clat     (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           elat     (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           csol_z   (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
├── Group: /geophysical_data
│       Dimensions:   (number_of_lines: 1710, pixels_per_line: 1272, wavelength_3d: 172)
│       Dimensions without coordinates: number_of_lines, pixels_per_line, wavelength_3d
│       Data variables:
│           Rrs       (number_of_lines, pixels_per_line, wavelength_3d) float32 1GB dask.array<chunksize=(32, 256, 40), meta=np.ndarray>
│           Rrs_unc   (number_of_lines, pixels_per_line, wavelength_3d) float32 1GB dask.array<chunksize=(32, 256, 40), meta=np.ndarray>
│           aot_865   (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           angstrom  (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           avw       (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           nflh      (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           l2_flags  (number_of_lines, pixels_per_line) int32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
├── Group: /navigation_data
│       Dimensions:    (number_of_lines: 1710, pixels_per_line: 1272)
│       Dimensions without coordinates: number_of_lines, pixels_per_line
│       Data variables:
│           longitude  (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           latitude   (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           tilt       (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│       Attributes:
│           gringpointlongitude:  [-75.3737   -51.710373 -54.89105  -55.18162  -82.05...
│           gringpointlatitude:   [ 3.1425157  8.132527  28.826893  31.327667  25.995...
│           gringpointsequence:   [1 2 3 4 5 6]
└── Group: /processing_control
    │   Attributes:
    │       software_name:     l2gen
    │       software_version:  9.11.0-09b1279b9
    │       input_sources:     PACE_OCI.20240613T171120.L1B.V3.nc,oci_gas_transmittan...
    │       calibration_data:  oci_gains_v3.1_20250722.nc
    │       mask_names:        ATMFAIL,LAND,CLDICE,HILT
    ├── Group: /processing_control/input_parameters
    │       Attributes: (12/256)
    │           ifile:                   PACE_OCI.20240613T171120.L1B.V3.nc
    │           ofile:                   PACE_OCI.20240613T171120.L2.OC_AOP.V3_1.nc
    │           l2prod:                  Rrs Rrs_unc aot_865 angstrom avw nflh
    │           oformat:                 netCDF4
    │           oformat_depth:           8bit
    │           fqfile:                  $OCDATAROOT/common/morel_fq_hyperspectral.nc
    │           ...                      ...
    │           spixl:                   1
    │           epixl:                   -1
    │           dpixl:                   1
    │           sline:                   1
    │           eline:                   -1
    │           dline:                   1
    └── Group: /processing_control/flag_percentages
            Attributes: (12/28)
                ATMFAIL:     0.023676855
                LAND:        21.685057
                PRODWARN:    6.5449724
                HIGLINT:     8.993618
                HILT:        1.804636
                HISATZEN:    13.917301
                ...          ...
                SEAICE:      0.0
                NAVFAIL:     0.0
                FILTER:      0.0
                BOWTIEDEL:   0.0
                HIPOL:       0.037285298
                PRODFAIL:    86.22021

# we could use xarray_open="datatree" but "dataset" tends to be faster here
pace_l2 = {'xarray_open': 'dataset', 'merge': ['/geophysical_data', '/navigation_data']}
plan.open_dataset(0, open_method=pace_l2)

open_method: {'xarray_open': 'dataset', 'merge': ['/geophysical_data', '/navigation_data'], 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge_kwargs': {}}

<xarray.Dataset> Size: 3GB
Dimensions:    (number_of_lines: 1710, pixels_per_line: 1272, wavelength_3d: 172)
Coordinates:
    longitude  (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    latitude   (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
Dimensions without coordinates: number_of_lines, pixels_per_line, wavelength_3d
Data variables:
    Rrs        (number_of_lines, pixels_per_line, wavelength_3d) float32 1GB dask.array<chunksize=(32, 256, 40), meta=np.ndarray>
    Rrs_unc    (number_of_lines, pixels_per_line, wavelength_3d) float32 1GB dask.array<chunksize=(32, 256, 40), meta=np.ndarray>
    aot_865    (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    angstrom   (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    avw        (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    nflh       (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    l2_flags   (number_of_lines, pixels_per_line) int32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    tilt       (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>

Get the matchups using that plan

pc.matchup() will open each L2 granule as a DataTree and merges all groups into a flat dataset. The spatial method will automatically use "kdtree" for non-gridded data. You can also try spatial_method="xoak-kdtree", which uses a similar algorithm.

Notice, that point 0 is matched to 2 granules and so has 2 rows with the same pc_id. Many matchups are NaN because there is no data (cloudy) for that lat/lon point.

%%time
# spatial_method="auto" defaults to "kdtree" in this case
res = pc.matchup(plan, variables=["Rrs"], open_method=pace_l2)

CPU times: user 22.4 s, sys: 2.21 s, total: 24.7 s
Wall time: 43.9 s

res.head()

	lat	lon	time	pc_id	granule_id	granule_time	granule_lat	granule_lon	Rrs_0	Rrs_1	...	Rrs_162	Rrs_163	Rrs_164	Rrs_165	Rrs_166	Rrs_167	Rrs_168	Rrs_169	Rrs_170	Rrs_171
0	27.3835	-82.7375	2024-06-13 12:00:00	0	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2024-06-13 17:18:49+00:00	27.443144	-82.612923	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	27.3835	-82.7375	2024-06-13 12:00:00	0	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2024-06-13 18:52:08+00:00	27.383293	-82.721527	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	27.1190	-82.7125	2024-06-14 12:00:00	1	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2024-06-14 17:53:34+00:00	27.101389	-82.717186	0.01299	0.012946	...	0.000238	0.000228	0.000198	0.000194	0.000186	0.000172	0.000152	0.000122	0.000108	0.000094
3	26.9435	-82.8170	2024-06-14 12:00:00	2	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2024-06-14 17:53:34+00:00	26.954554	-82.810219	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	26.6875	-82.8065	2024-06-14 12:00:00	3	https://obdaac-tea.earthdatacloud.nasa.gov/ob-...	2024-06-14 17:53:34+00:00	26.703817	-82.817726	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 180 columns

%%time
# we can specify to use the xoak package
res2 = pc.matchup(plan, 
                  spatial_method="xoak-kdtree",
                  variables=["Rrs"], 
                  open_method=pace_l2)

CPU times: user 26.2 s, sys: 1.22 s, total: 27.4 s
Wall time: 42.7 s