Skip to content

Quickstart

point-collocation gets matchups to lat/lon using the pixel center that is closest to the lat/lon point (equivalent to method="nearest"). For time, you can select a buffer of 0, which means the time of the point must be within the time range of the file or a buffer like buffer="1D" to find files within 1 day of the point. Using a buffer can help for L2 files with short windows (minutes) or collections with infrequent files.

  • Create a plan for files to use pc.plan()
  • Print the plan to check it plan.summary()
  • Do the plan and get matchups for variables pc.matchup(plan, geometry='grid', variables=['var'])

Prerequisite -- Login to EarthData

The examples here use NASA EarthData and you need to have an account with EarthData. Make sure you can login.

import earthaccess
earthaccess.login()
/Users/eli.holmes/Documents/GitHub/point-collocation/.micromamba/envs/point-collocation-dev/lib/python3.14/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm





<earthaccess.auth.Auth at 0x11212d7f0>

Get some points to matchup

from pathlib import Path
import pandas as pd

HERE = Path.cwd()
POINTS_CSV = HERE / "fixtures" / "points.csv"
df_points = pd.read_csv(POINTS_CSV)  # lat, lon, date columns
print(len(df_points))
df_points.head()
595
lat lon date
0 27.3835 -82.7375 2024-06-13
1 27.1190 -82.7125 2024-06-14
2 26.9435 -82.8170 2024-06-14
3 26.6875 -82.8065 2024-06-14
4 26.6675 -82.6455 2024-06-14

Start plan -- Take a look at the files in a collection

Now we use the point_collocation package. First we will look at the files available and figure out which ones we want.

%%time
import point_collocation as pc
plan = pc.plan(
    df_points,
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L3M_RRS",
    }
)
CPU times: user 1.82 s, sys: 72.5 ms, total: 1.89 s
Wall time: 8.25 s
plan.summary(n=1)
Plan: 595 points → 210 unique granule(s)
  Points with 0 matches : 0
  Points with >1 matches: 595
  Time buffer: 0 days 00:00:00

First 1 point(s):
  [0] lat=27.3835, lon=-82.7375, time=2024-06-13 12:00:00: 16 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240321_20240620.L3m.SNSP.RRS.V3_1.Rrs.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240321_20240620.L3m.SNSP.RRS.V3_1.Rrs.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240516_20240616.L3m.R32.RRS.V3_1.Rrs.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240516_20240616.L3m.R32.RRS.V3_1.Rrs.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240524_20240624.L3m.R32.RRS.V3_1.Rrs.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240524_20240624.L3m.R32.RRS.V3_1.Rrs.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240601_20240630.L3m.MO.RRS.V3_1.Rrs.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240601_20240630.L3m.MO.RRS.V3_1.Rrs.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240601_20240702.L3m.R32.RRS.V3_1.Rrs.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240601_20240702.L3m.R32.RRS.V3_1.Rrs.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240609_20240616.L3m.8D.RRS.V3_1.Rrs.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240609_20240616.L3m.8D.RRS.V3_1.Rrs.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240609_20240710.L3m.R32.RRS.V3_1.Rrs.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240609_20240710.L3m.R32.RRS.V3_1.Rrs.4km.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240613.L3m.DAY.RRS.V3_1.Rrs.0p1deg.nc
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240613.L3m.DAY.RRS.V3_1.Rrs.4km.nc

Create new plan with filter on file names

We will use the monthly 4km files.

%%time
import point_collocation as pc
plan = pc.plan(
    df_points,
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L3M_RRS",
        "granule_name": "*.MO.*.4km.*",
    }
)
CPU times: user 93.2 ms, sys: 32.5 ms, total: 126 ms
Wall time: 3.25 s
# check the plan and see how many files per point
# we want 1 file per point in this case
# Looks like 6 monthly files
plan.summary()
Plan: 595 points → 4 unique granule(s)
  Points with 0 matches : 0
  Points with >1 matches: 0
  Time buffer: 0 days 00:00:00

First 5 point(s):
  [0] lat=27.3835, lon=-82.7375, time=2024-06-13 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240601_20240630.L3m.MO.RRS.V3_1.Rrs.4km.nc
  [1] lat=27.1190, lon=-82.7125, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240601_20240630.L3m.MO.RRS.V3_1.Rrs.4km.nc
  [2] lat=26.9435, lon=-82.8170, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240601_20240630.L3m.MO.RRS.V3_1.Rrs.4km.nc
  [3] lat=26.6875, lon=-82.8065, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240601_20240630.L3m.MO.RRS.V3_1.Rrs.4km.nc
  [4] lat=26.6675, lon=-82.6455, time=2024-06-14 12:00:00: 1 match(es)
    → https://obdaac-tea.earthdatacloud.nasa.gov/ob-cumulus-prod-public/PACE_OCI.20240601_20240630.L3m.MO.RRS.V3_1.Rrs.4km.nc

Check the variables in the files

This will open one file and show us the variables. We want 'Rrs' in this case.

plan.show_variables(geometry="grid")
geometry     : 'grid'
open_method  : 'dataset'
Dimensions : {'lat': 4320, 'lon': 8640, 'wavelength': 172, 'rgb': 3, 'eightbitcolor': 256}
Variables  : ['Rrs', 'palette']

Geolocation: ('lon', 'lat') — lon dims=('lon',), lat dims=('lat',)

Get the matchups using our plan

Let's start with 100 points since 595 might take awhile.

%%time
res = pc.matchup(plan[0:100], geometry="grid", variables=["Rrs"])
CPU times: user 7.94 s, sys: 1.58 s, total: 9.52 s
Wall time: 46.7 s
res.head()
lat lon time granule_id Rrs_346 Rrs_348 Rrs_351 Rrs_353 Rrs_356 Rrs_358 ... Rrs_706 Rrs_707 Rrs_708 Rrs_709 Rrs_711 Rrs_712 Rrs_713 Rrs_714 Rrs_717 Rrs_719
0 27.3835 -82.7375 2024-06-13 12:00:00 https://obdaac-tea.earthdatacloud.nasa.gov/ob-... 0.004034 0.004070 0.004170 0.004278 0.004462 0.004604 ... 0.000224 0.000202 0.000190 0.000176 0.000168 0.000156 0.000144 0.000134 0.000158 0.000202
1 27.1190 -82.7125 2024-06-14 12:00:00 https://obdaac-tea.earthdatacloud.nasa.gov/ob-... 0.004562 0.004616 0.004700 0.004692 0.004806 0.005070 ... 0.000108 0.000094 0.000084 0.000078 0.000072 0.000066 0.000060 0.000048 0.000062 0.000098
2 26.9435 -82.8170 2024-06-14 12:00:00 https://obdaac-tea.earthdatacloud.nasa.gov/ob-... 0.005112 0.005282 0.005458 0.005582 0.005868 0.006226 ... 0.000118 0.000108 0.000102 0.000098 0.000098 0.000092 0.000086 0.000068 0.000052 0.000066
3 26.6875 -82.8065 2024-06-14 12:00:00 https://obdaac-tea.earthdatacloud.nasa.gov/ob-... 0.004648 0.004904 0.005108 0.005242 0.005548 0.005944 ... 0.000178 0.000158 0.000148 0.000138 0.000130 0.000126 0.000126 0.000120 0.000158 0.000230
4 26.6675 -82.6455 2024-06-14 12:00:00 https://obdaac-tea.earthdatacloud.nasa.gov/ob-... 0.004944 0.005064 0.005190 0.005288 0.005504 0.005838 ... 0.000094 0.000078 0.000068 0.000062 0.000058 0.000054 0.000052 0.000050 0.000106 0.000166

5 rows × 176 columns

Open files in plan

Sometimes it is helpful to look at the granules. There are helper functions for that. You need to specify the format of the data, "grid" for level 3 gridded or "swath" for level 2 swath data.

ds = plan.open_dataset(plan[0], geometry="grid")
ds
<xarray.Dataset> Size: 26GB
Dimensions:     (lat: 4320, lon: 8640, wavelength: 172, rgb: 3,
                 eightbitcolor: 256)
Coordinates:
  * lat         (lat) float32 17kB 89.98 89.94 89.9 ... -89.9 -89.94 -89.98
  * lon         (lon) float32 35kB -180.0 -179.9 -179.9 ... 179.9 179.9 180.0
  * wavelength  (wavelength) float64 1kB 346.0 348.0 351.0 ... 714.0 717.0 719.0
Dimensions without coordinates: rgb, eightbitcolor
Data variables:
    Rrs         (lat, lon, wavelength) float32 26GB dask.array<chunksize=(16, 1024, 8), meta=np.ndarray>
    palette     (rgb, eightbitcolor) uint8 768B dask.array<chunksize=(3, 256), meta=np.ndarray>
Attributes: (12/64)
    product_name:                      PACE_OCI.20240601_20240630.L3m.MO.RRS....
    instrument:                        OCI
    title:                             OCI Level-3 Standard Mapped Image
    project:                           Ocean Biology Processing Group (NASA/G...
    platform:                          PACE
    source:                            satellite observations from OCI-PACE
    ...                                ...
    identifier_product_doi:            10.5067/PACE/OCI/L3M/RRS/3.1
    keywords:                          Earth Science > Oceans > Ocean Optics ...
    keywords_vocabulary:               NASA Global Change Master Directory (G...
    data_bins:                         16464585
    data_minimum:                      -0.009998
    data_maximum:                      0.09856601