[Question] Efficiently selecting nearest time data per group in Xarray #10233
Unanswered
Parrot7483
asked this question in
Q&A
Replies: 2 comments
-
Interesting problem! Can you write a minimal example with synthetic data that we could test out please? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Here you go. import numpy as np
import xarray as xr
def f(ephemeris, x):
a = ephemeris.a.item()
b = ephemeris.b.item()
return a * x + b
# Function to compute values based on nearest time
def compute_nearest_time_values(ephemeris, observations, x):
"""
Computes f(x) = a * x + b for each (id, time) pair in observations,
using the nearest (id, time) pair from data for coefficients a and b.
"""
result = xr.Dataset(
{
"value": (("id", "time"), np.empty((len(observations.id), len(observations.time))))
},
coords={"id": observations.id, "time": observations.time},
)
for identifier in observations.id:
observations_id = ephemeris.sel(id=identifier).dropna(dim="time", how="all")
for time in observations.time:
# Find nearest entry in data
nearest = observations_id.sel(time=time, method="nearest")
# Compute the value
value = f(nearest, x)
# Store the result
result["value"].loc[identifier, time] = value
return result
# Dummy ids
i0, i1 = "id0", "id1"
# Dummy ephemeris
e0, e1, e2, e3 = "2025-01-01T00:00", "2025-01-01T06:00", "2025-01-01T12:00", "2025-01-01T18:00"
ephemeris = xr.Dataset(
{
"a": (("id", "time"), [[1, np.nan, 3, 4], [10, 20, np.nan, 40]]),
"b": (("id", "time"), [[5, np.nan, 3, 2], [50, 40, np.nan, 20]]),
},
coords={"id": [i0, i1], "time": np.array([e0, e1, e2, e3], dtype="datetime64")},
)
# Dummy observation
o0, o1, o2 = "2025-01-01T02:30", "2025-01-01T06:15", "2025-01-01T12:15"
observations = xr.Dataset(
coords={"id": [i0, i1], "time": np.array([o0, o1, o2], dtype="datetime64")},
)
x = 10
result = compute_nearest_time_values(ephemeris, observations, x)
assert float(result.sel(id=i0, time=o0).value) == 1 * x + 5
assert float(result.sel(id=i0, time=o1).value) == 3 * x + 3
assert float(result.sel(id=i0, time=o2).value) == 3 * x + 3 # 06:00 does not exist, 12:00 is nearest
assert float(result.sel(id=i1, time=o0).value) == 10 * x + 50
assert float(result.sel(id=i1, time=o1).value) == 20 * x + 40
assert float(result.sel(id=i1, time=o2).value) == 40 * x + 20 # 12:00 does not exist, 18:00 is nearest |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Subject: Efficiently selecting nearest time data per group (sv) in Xarray
Hi Xarray community,
I'm working with GNSS data where I need to calculate satellite positions based on ephemeris data. I have two main Xarray Datasets:
ranges
: Contains observation data, indexed by coordinatessv
(satellite ID, string) andtime
(datetime). This dataset defines the(sv, time)
pairs for which I need calculations.nav_data
: Contains satellite ephemeris data, also indexed bysv
(string) andtime
(datetime). For eachsv
, there are multiple entries at different times, representing updated ephemeris messages.Goal:
For each
(sv, time)
coordinate pair present inranges
, I need to select the corresponding ephemeris data fromnav_data
using the following logic:sv
coordinate exactly.sv
, find the entry innav_data
with the nearesttime
to thetime
from theranges
pair.Current (Slow) Approach:
I'm currently using nested loops, which is inefficient for my dataset size (potentially thousands of time steps and multiple satellites):
entire module here
Challenge & Attempts:
I need a vectorized Xarray solution to replace these loops. I've tried:
nav_data.reindex_like(ranges, method='nearest')
: This doesn't work as intended becausemethod='nearest'
is applied to bothtime
(which is desired) andsv
(which is not desired – I need an exact match forsv
).nav_data.groupby('sv').apply(...)
: I attempted to groupnav_data
bysv
and then usereindex
orsel
withmethod='nearest'
within the applied function, something likelambda ds: ds.reindex(time=ranges.time, method='nearest')
. However, I ran into issues getting this to work correctly, possibly related to handling the coordinates and combining the results back.Question:
What is the idiomatic Xarray way to efficiently perform this grouped nearest-neighbor lookup? Specifically, how can I select data from
nav_data
based on the(sv, time)
coordinates inranges
, ensuring an exact match onsv
and a nearest match ontime
for each group?Thanks for any guidance or suggestions!
Beta Was this translation helpful? Give feedback.
All reactions