Skip to content

Open in Colab

PACE Level 2

The Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) Level-2 products provide geolocated (lat/lon) swath data. Each granule contains high-resolution measurements (∼1 km) organized in a two-dimensional swath (scan line × pixel), rather than a regular grid. These data include hyperspectral remote sensing reflectance (Rrs) across hundreds of wavelengths, along with derived products such as chlorophyll-a, inherent optical properties, and atmospheric parameters. We use the PACE_OCI_L2_AOP product for this demonstration.

Steps:

  • Create a plan for files to use pc.plan()
  • Print the plan to check it print(plan.summary())
  • Do the plan and get matchups pc.matchup(plan)

Note: In a virtual machine in AWS us-west-2, where NASA cloud data is, the point matchups are fast. In Colab, say, your compute is not in the same data region nor provider, and the same matchups might take 10x longer.

Prerequisites

The examples here use NASA EarthData and you need to have an account with EarthData. Make sure you can login.

# if needed
!pip install point-collocation
import earthaccess
earthaccess.login()
<earthaccess.auth.Auth at 0x7fd752432d20>

Here are the level 2 datasets

import earthaccess
results = earthaccess.search_datasets(instrument="oci")

short_names = [
    item.summary()["short-name"]
    for item in results
    if "L2" in item.summary()["short-name"]
]

print(short_names)
['PACE_OCI_L2_UVAI_UAA_NRT', 'PACE_OCI_L2_UVAI_UAA', 'PACE_OCI_L2_AER_UAA_NRT', 'PACE_OCI_L2_AER_UAA', 'PACE_OCI_L2_AOP_NRT', 'PACE_OCI_L2_AOP', 'PACE_OCI_L2_CLOUD_MASK_NRT', 'PACE_OCI_L2_CLOUD_MASK', 'PACE_OCI_L2_CLOUD_NRT', 'PACE_OCI_L2_CLOUD', 'PACE_OCI_L2_LANDVI_NRT', 'PACE_OCI_L2_LANDVI', 'PACE_OCI_L2_BGC_NRT', 'PACE_OCI_L2_BGC', 'PACE_OCI_L2_IOP_NRT', 'PACE_OCI_L2_IOP', 'PACE_OCI_L2_PAR_NRT', 'PACE_OCI_L2_PAR', 'PACE_OCI_L2_SFREFL_NRT', 'PACE_OCI_L2_SFREFL', 'PACE_OCI_L2_TRGAS_NRT', 'PACE_OCI_L2_TRGAS']

Load some points

import pandas as pd
url = (
    "https://raw.githubusercontent.com/"
    "fish-pace/point-collocation/main/"
    "examples/fixtures/points.csv"
)
df_points = pd.read_csv(url)
print(len(df_points))
df_points.head()
595
lat lon date
0 27.3835 -82.7375 2024-06-13
1 27.1190 -82.7125 2024-06-14
2 26.9435 -82.8170 2024-06-14
3 26.6875 -82.8065 2024-06-14
4 26.6675 -82.6455 2024-06-14

Get a plan for matchups for 1st 50 points from PACE data

A time buffer is needed since the swath granules are short time windows. We set time_buffer="12h" to find granules that are within 12 hours of our point times.

%%time
import point_collocation as pc
plan = pc.plan(
    df_points[0:50],    
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L2_AOP",
    },
    time_buffer="12h"
)
CPU times: user 21.1 ms, sys: 8.6 ms, total: 29.7 ms
Wall time: 510 ms
plan.summary()
Plan: 50 points → 13 unique granule(s)
  Points with 0 matches : 0
  Points with >1 matches: 10
  Time buffer: 0 days 12:00:00

First 5 point(s):
  [0] lat=27.3835, lon=-82.7375, time=2024-06-13 12:00:00: 2 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240613T171620.L2.OC_AOP.V3_1.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240613T184939.L2.OC_AOP.V3_1.nc
  [1] lat=27.1190, lon=-82.7125, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240614T175104.L2.OC_AOP.V3_1.nc
  [2] lat=26.9435, lon=-82.8170, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240614T175104.L2.OC_AOP.V3_1.nc
  [3] lat=26.6875, lon=-82.8065, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240614T175104.L2.OC_AOP.V3_1.nc
  [4] lat=26.6675, lon=-82.6455, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240614T175104.L2.OC_AOP.V3_1.nc

Look at the granule to see groups

We will use plan.open_dataset(0) to open the first granule and take a look. This uses open_method="auto". It will try xr.open_dataset, discover no lat/lon and then try xr.open_datatree. You will need to specify what groups to merge. Looking at the datatree, we see want these two groups: /geophysical_data and /navigation_data. We will create a open method dictionary.

%%time
plan.open_dataset(0, open_method="datatree")
open_method: {'xarray_open': 'datatree', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'merge': None, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None}
CPU times: user 276 ms, sys: 61.8 ms, total: 338 ms
Wall time: 853 ms
<xarray.DataTree>
Group: /
│   Attributes: (12/47)
│       title:                             OCI Level-2 Data AOP
│       product_name:                      PACE_OCI.20240613T171120.L2.OC_AOP.V3_...
│       processing_version:                3.1
│       history:                           l2gen par=/data18/sdpsoper/vdc/vpu37/w...
│       instrument:                        OCI
│       platform:                          PACE
│       ...                                ...
│       geospatial_lon_min:                -82.05421
│       startDirection:                    Ascending
│       endDirection:                      Ascending
│       day_night_flag:                    Day
│       earth_sun_distance_correction:     0.9694168567657471
│       geospatial_bounds:                 POLYGON ((-55.18162 31.32767, -82.0542...
├── Group: /sensor_band_parameters
│       Dimensions:        (number_of_bands: 286, number_of_reflective_bands: 286,
│                           wavelength_3d: 172)
│       Coordinates:
│         * wavelength_3d  (wavelength_3d) float64 1kB 346.0 348.0 351.0 ... 717.0 719.0
│       Dimensions without coordinates: number_of_bands, number_of_reflective_bands
│       Data variables:
│           wavelength     (number_of_bands) float64 2kB dask.array<chunksize=(32,), meta=np.ndarray>
│           vcal_gain      (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           vcal_offset    (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           F0             (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           aw             (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           bbw            (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           k_oz           (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           k_no2          (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
│           Tau_r          (number_of_reflective_bands) float32 1kB dask.array<chunksize=(32,), meta=np.ndarray>
├── Group: /scan_line_attributes
│       Dimensions:  (number_of_lines: 1710)
│       Dimensions without coordinates: number_of_lines
│       Data variables: (12/13)
│           year     (number_of_lines) float64 14kB dask.array<chunksize=(32,), meta=np.ndarray>
│           day      (number_of_lines) float64 14kB dask.array<chunksize=(32,), meta=np.ndarray>
│           msec     (number_of_lines) float64 14kB dask.array<chunksize=(32,), meta=np.ndarray>
│           time     (number_of_lines) datetime64[ns] 14kB dask.array<chunksize=(32,), meta=np.ndarray>
│           detnum   (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           mside    (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           ...       ...
│           clon     (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           elon     (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           slat     (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           clat     (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           elat     (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│           csol_z   (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
├── Group: /geophysical_data
│       Dimensions:   (number_of_lines: 1710, pixels_per_line: 1272, wavelength_3d: 172)
│       Dimensions without coordinates: number_of_lines, pixels_per_line, wavelength_3d
│       Data variables:
│           Rrs       (number_of_lines, pixels_per_line, wavelength_3d) float32 1GB dask.array<chunksize=(32, 256, 40), meta=np.ndarray>
│           Rrs_unc   (number_of_lines, pixels_per_line, wavelength_3d) float32 1GB dask.array<chunksize=(32, 256, 40), meta=np.ndarray>
│           aot_865   (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           angstrom  (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           avw       (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           nflh      (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           l2_flags  (number_of_lines, pixels_per_line) int32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
├── Group: /navigation_data
│       Dimensions:    (number_of_lines: 1710, pixels_per_line: 1272)
│       Dimensions without coordinates: number_of_lines, pixels_per_line
│       Data variables:
│           longitude  (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           latitude   (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
│           tilt       (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>
│       Attributes:
│           gringpointlongitude:  [-75.3737   -51.710373 -54.89105  -55.18162  -82.05...
│           gringpointlatitude:   [ 3.1425157  8.132527  28.826893  31.327667  25.995...
│           gringpointsequence:   [1 2 3 4 5 6]
└── Group: /processing_control
    │   Attributes:
    │       software_name:     l2gen
    │       software_version:  9.11.0-09b1279b9
    │       input_sources:     PACE_OCI.20240613T171120.L1B.V3.nc,oci_gas_transmittan...
    │       calibration_data:  oci_gains_v3.1_20250722.nc
    │       mask_names:        ATMFAIL,LAND,CLDICE,HILT
    ├── Group: /processing_control/input_parameters
    │       Attributes: (12/256)
    │           ifile:                   PACE_OCI.20240613T171120.L1B.V3.nc
    │           ofile:                   PACE_OCI.20240613T171120.L2.OC_AOP.V3_1.nc
    │           l2prod:                  Rrs Rrs_unc aot_865 angstrom avw nflh
    │           oformat:                 netCDF4
    │           oformat_depth:           8bit
    │           fqfile:                  $OCDATAROOT/common/morel_fq_hyperspectral.nc
    │           ...                      ...
    │           spixl:                   1
    │           epixl:                   -1
    │           dpixl:                   1
    │           sline:                   1
    │           eline:                   -1
    │           dline:                   1
    └── Group: /processing_control/flag_percentages
            Attributes: (12/28)
                ATMFAIL:     0.023676855
                LAND:        21.685057
                PRODWARN:    6.5449724
                HIGLINT:     8.993618
                HILT:        1.804636
                HISATZEN:    13.917301
                ...          ...
                SEAICE:      0.0
                NAVFAIL:     0.0
                FILTER:      0.0
                BOWTIEDEL:   0.0
                HIPOL:       0.037285298
                PRODFAIL:    86.22021
# we could use xarray_open="datatree" but "dataset" tends to be faster here
pace_l2 = {'xarray_open': 'dataset', 'merge': ['/geophysical_data', '/navigation_data']}
plan.open_dataset(0, open_method=pace_l2)
open_method: {'xarray_open': 'dataset', 'merge': ['/geophysical_data', '/navigation_data'], 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge_kwargs': {}}
<xarray.Dataset> Size: 3GB
Dimensions:    (number_of_lines: 1710, pixels_per_line: 1272, wavelength_3d: 172)
Coordinates:
    longitude  (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    latitude   (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
Dimensions without coordinates: number_of_lines, pixels_per_line, wavelength_3d
Data variables:
    Rrs        (number_of_lines, pixels_per_line, wavelength_3d) float32 1GB dask.array<chunksize=(32, 256, 40), meta=np.ndarray>
    Rrs_unc    (number_of_lines, pixels_per_line, wavelength_3d) float32 1GB dask.array<chunksize=(32, 256, 40), meta=np.ndarray>
    aot_865    (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    angstrom   (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    avw        (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    nflh       (number_of_lines, pixels_per_line) float32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    l2_flags   (number_of_lines, pixels_per_line) int32 9MB dask.array<chunksize=(256, 1272), meta=np.ndarray>
    tilt       (number_of_lines) float32 7kB dask.array<chunksize=(32,), meta=np.ndarray>

Get the matchups using that plan

pc.matchup() will open each L2 granule as a DataTree and merges all groups into a flat dataset. The spatial method will automatically use "kdtree" for non-gridded data. You can also try spatial_method="xoak-kdtree", which uses a similar algorithm.

Notice, that point 0 is matched to 2 granules and so has 2 rows with the same pc_id. Many matchups are NaN because there is no data (cloudy) for that lat/lon point.

%%time
# spatial_method="auto" defaults to "kdtree" in this case
res = pc.matchup(plan, variables=["Rrs"], open_method=pace_l2)
CPU times: user 22.4 s, sys: 2.21 s, total: 24.7 s
Wall time: 43.9 s
res.head()
lat lon time pc_id granule_id granule_time granule_lat granule_lon Rrs_0 Rrs_1 ... Rrs_162 Rrs_163 Rrs_164 Rrs_165 Rrs_166 Rrs_167 Rrs_168 Rrs_169 Rrs_170 Rrs_171
0 27.3835 -82.7375 2024-06-13 12:00:00 0 https://obdaac-tea.earthdatacloud.nasa.gov/ob-... 2024-06-13 17:18:49+00:00 27.443144 -82.612923 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 27.3835 -82.7375 2024-06-13 12:00:00 0 https://obdaac-tea.earthdatacloud.nasa.gov/ob-... 2024-06-13 18:52:08+00:00 27.383293 -82.721527 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 27.1190 -82.7125 2024-06-14 12:00:00 1 https://obdaac-tea.earthdatacloud.nasa.gov/ob-... 2024-06-14 17:53:34+00:00 27.101389 -82.717186 0.01299 0.012946 ... 0.000238 0.000228 0.000198 0.000194 0.000186 0.000172 0.000152 0.000122 0.000108 0.000094
3 26.9435 -82.8170 2024-06-14 12:00:00 2 https://obdaac-tea.earthdatacloud.nasa.gov/ob-... 2024-06-14 17:53:34+00:00 26.954554 -82.810219 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 26.6875 -82.8065 2024-06-14 12:00:00 3 https://obdaac-tea.earthdatacloud.nasa.gov/ob-... 2024-06-14 17:53:34+00:00 26.703817 -82.817726 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 180 columns

%%time
# we can specify to use the xoak package
res2 = pc.matchup(plan, 
                  spatial_method="xoak-kdtree",
                  variables=["Rrs"], 
                  open_method=pace_l2)
CPU times: user 26.2 s, sys: 1.22 s, total: 27.4 s
Wall time: 42.7 s