Buildling on @ritchie46 answer, if you need to find the closest value, you can apply join_asof
in both the forward and backward direction, and choose the result that is closest.
For example, if we start with this data:
df_a = pl.DataFrame(
{"arrival_time": [time(8, 40, 10), time(8, 59, 0), time(
19, 31, 26)], "train": [112, 113, 134]}
)
df_b = pl.DataFrame(
{
"arrival_time": [
time(8),
time(8, 30, 0),
time(9),
time(16),
time(16, 30, 0),
time(17),
],
"period": [
"early morning",
"mid-morning",
"late morning",
"early afternoon",
"mid afternoon",
"late afternoon",
],
}
)
df_a
df_b
shape: (3, 2)
┌──────────────┬───────┐
│ arrival_time ┆ train │
│ --- ┆ --- │
│ time ┆ i64 │
╞══════════════╪═══════╡
│ 08:40:10 ┆ 112 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 08:59:00 ┆ 113 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 19:31:26 ┆ 134 │
└──────────────┴───────┘
>>> df_b
shape: (6, 2)
┌──────────────┬─────────────────┐
│ arrival_time ┆ period │
│ --- ┆ --- │
│ time ┆ str │
╞══════════════╪═════════════════╡
│ 08:00:00 ┆ early morning │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 08:30:00 ┆ mid-morning │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 09:00:00 ┆ late morning │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 16:00:00 ┆ early afternoon │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 16:30:00 ┆ mid afternoon │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 17:00:00 ┆ late afternoon │
└──────────────┴─────────────────┘
We see that train 113
arrives closer to late morning
than to mid-morning
. If you need to capture that, you can join with both backward
and forward
, and then choose which is closest:
(
df_a
.join_asof(df_b.with_column(pl.col('arrival_time').alias('early_time')), on="arrival_time", strategy="backward", suffix="_early")
.join_asof(df_b.with_column(pl.col('arrival_time').alias('late_time')), on="arrival_time", strategy="forward", suffix="_late")
.with_column(
pl.when(pl.col('early_time').is_null())
.then(pl.col('period_late'))
.when(pl.col('late_time').is_null())
.then(pl.col('period'))
.when((pl.col('arrival_time').cast(pl.Datetime) - pl.col('early_time').cast(pl.Datetime)) <
(pl.col('late_time').cast(pl.Datetime) - pl.col('arrival_time').cast(pl.Datetime)))
.then(pl.col('period'))
.otherwise(pl.col('period_late'))
.alias('closest')
)
)
shape: (3, 7)
┌──────────────┬───────┬────────────────┬────────────┬──────────────┬───────────┬────────────────┐
│ arrival_time ┆ train ┆ period ┆ early_time ┆ period_late ┆ late_time ┆ closest │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ time ┆ i64 ┆ str ┆ time ┆ str ┆ time ┆ str │
╞══════════════╪═══════╪════════════════╪════════════╪══════════════╪═══════════╪════════════════╡
│ 08:40:10 ┆ 112 ┆ mid-morning ┆ 08:30:00 ┆ late morning ┆ 09:00:00 ┆ mid-morning │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 08:59:00 ┆ 113 ┆ mid-morning ┆ 08:30:00 ┆ late morning ┆ 09:00:00 ┆ late morning │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 19:31:26 ┆ 134 ┆ late afternoon ┆ 17:00:00 ┆ null ┆ null ┆ late afternoon │
└──────────────┴───────┴────────────────┴────────────┴──────────────┴───────────┴────────────────┘
Note that in the when/then/otherwise
I've chosen to explicitly handle the case where a train arrives before the first period or after the last period.