Substract values from two columns where same time (pandas, python)

Question

I have a pandas dataFrame with 3 columns of weather data - temperature, time and the name of the weather station.

It looks like this:

Time	Station_name	Temperature
2022-05-12 22:09:35+00:00	station_a	18.3
2022-05-12 22:09:42+00:00	station_b	18.0

I would like to calculate the temperature difference of station_a from station_b at every same minute (as the time stamps are not exactly equal but precise at minute-level (and there is only one measurement every 10 minutes) in a new column.

Is there a way to do this?

score 3 · Accepted Answer · answered Jul 09 '22 at 11:42

You can use a merge_asof on the two sub-dataframes:

df['Time'] = pd.to_datetime(df['Time'])

out = (pd
   .merge_asof(df[df['Station_name'].eq('station_a')],
               df[df['Station_name'].eq('station_b')],
               on='Time', direction='nearest',
               tolerance=pd.Timedelta('1min'),
               suffixes=('_a', '_b')
              )
   .set_index('Time')
   .eval('diff = Temperature_b - Temperature_a')
   ['diff']
)

output:

Time
2022-05-12 22:09:35+00:00   -0.3
Name: diff, dtype: float64

You can also try to round the times, but it is more risky if one time gets rounded up and the other down:

df['Time'] = pd.to_datetime(df['Time'])

(df
 .assign(Time=df['Time'].dt.round('10min'))
 .pivot('Time', 'Station_name', 'Temperature')
 .eval('diff = station_b - station_a')
)

output:

Station_name               station_a  station_b  diff
Time                                                 
2022-05-12 22:10:00+00:00       18.3       18.0  -0.3

What do call that strategy of calling multiple methods in one block? — RoboChris, Jul 09 '22 at 12:14

score 0 · Answer 2 · edited Jul 09 '22 at 11:23

0

You can use pandas.Series.diff

For example:

df['Temperature_diff'] = df['Temperature'].diff()

edited Jul 09 '22 at 11:23

Gerhard

22,678
7
27
43

answered Jul 09 '22 at 11:22

Ziyad Al Qahtani

11
1

3

that just gives the difference to the previous elements in the series - IIUC OP wants to compare station a and b, and do that for each minute of the time series. – FObersteiner Jul 09 '22 at 11:26
In this context it even does work because my dataset has only two stations and the time is a little bit shifted - so they are listed alternately. But yes, with more stations and other timestamps it would not work. – Vilerala Jul 09 '22 at 14:50

dantzgo · Answer 3 · 2022-07-09T12:16:57.870

If you have this pandas dataframe

from datetime import datetime
import pandas as pd

data = [{"Time":datetime(2022,5,12,22,9,35), "Station_name":"station_a", "Temperature": 18.3},
        {"Time":datetime(2022,5,12,22,9,42), "Station_name":"station_b", "Temperature": 18.0 },
        {"Time":datetime(2022,5,12,22,10,35), "Station_name":"station_a", "Temperature": 17.3},
        {"Time":datetime(2022,5,12,22,10,42), "Station_name":"station_b", "Temperature": 18.0 }]

df = pd.DataFrame(data)

truncate to minutes: Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

pivot tables / reshape: https://pandas.pydata.org/docs/user_guide/reshaping.html

#truncate to minutes
df["Time_trunc"] = df["Time"].values.astype('<M8[m]')

#Set index (in order to pivot) and pivot (unstack)
df = df.set_index(["Time_trunc",'Station_name'])
df_pivoted = df.unstack()

#Flatten multi-columns
df_new = pd.DataFrame(df_pivoted.to_records())
df_new.columns = ["Time_trunc", "Temp_station_a", "Temp_station_b", "time_station_a", "Time_station_b"]

#Add Diff of temperatures
df_new["DiffAbs"] = abs(df_new["Temp_station_a"]-df_new["Temp_station_b"])

Resulting DataFrame Image

As I give a hint in the second part of my answer, this kind of approach will fail in cases such as 09:43 / 10:12 — mozway, Jul 09 '22 at 12:16
sorry @mozway I don't get what you mean. Would it fail when truncating to minutes? Why those times? — dantzgo, Jul 09 '22 at 12:28
It's an example. In the above case, loop would want the two times (MM:SS) to match (they are within a minute), but truncating would give two different minute bins (9 min and 10min) — mozway, Jul 09 '22 at 12:42

Substract values from two columns where same time (pandas, python)

3 Answers3