3

I have a pandas dataFrame with 3 columns of weather data - temperature, time and the name of the weather station.

It looks like this:

Time Station_name Temperature
2022-05-12 22:09:35+00:00 station_a 18.3
2022-05-12 22:09:42+00:00 station_b 18.0

I would like to calculate the temperature difference of station_a from station_b at every same minute (as the time stamps are not exactly equal but precise at minute-level (and there is only one measurement every 10 minutes) in a new column.

Is there a way to do this?

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
Vilerala
  • 41
  • 4

3 Answers3

3

You can use a merge_asof on the two sub-dataframes:

df['Time'] = pd.to_datetime(df['Time'])

out = (pd
   .merge_asof(df[df['Station_name'].eq('station_a')],
               df[df['Station_name'].eq('station_b')],
               on='Time', direction='nearest',
               tolerance=pd.Timedelta('1min'),
               suffixes=('_a', '_b')
              )
   .set_index('Time')
   .eval('diff = Temperature_b - Temperature_a')
   ['diff']
)

output:

Time
2022-05-12 22:09:35+00:00   -0.3
Name: diff, dtype: float64

You can also try to round the times, but it is more risky if one time gets rounded up and the other down:

df['Time'] = pd.to_datetime(df['Time'])

(df
 .assign(Time=df['Time'].dt.round('10min'))
 .pivot('Time', 'Station_name', 'Temperature')
 .eval('diff = station_b - station_a')
)

output:

Station_name               station_a  station_b  diff
Time                                                 
2022-05-12 22:10:00+00:00       18.3       18.0  -0.3
mozway
  • 194,879
  • 13
  • 39
  • 75
0

You can use pandas.Series.diff

For example:

df['Temperature_diff'] = df['Temperature'].diff()
Gerhard
  • 22,678
  • 7
  • 27
  • 43
  • 3
    that just gives the difference to the previous elements in the series - IIUC OP wants to compare station a and b, and do that for each minute of the time series. – FObersteiner Jul 09 '22 at 11:26
  • In this context it even does work because my dataset has only two stations and the time is a little bit shifted - so they are listed alternately. But yes, with more stations and other timestamps it would not work. – Vilerala Jul 09 '22 at 14:50
0

If you have this pandas dataframe

from datetime import datetime
import pandas as pd

data = [{"Time":datetime(2022,5,12,22,9,35), "Station_name":"station_a", "Temperature": 18.3},
        {"Time":datetime(2022,5,12,22,9,42), "Station_name":"station_b", "Temperature": 18.0 },
        {"Time":datetime(2022,5,12,22,10,35), "Station_name":"station_a", "Temperature": 17.3},
        {"Time":datetime(2022,5,12,22,10,42), "Station_name":"station_b", "Temperature": 18.0 }]

df = pd.DataFrame(data)

truncate to minutes: Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

pivot tables / reshape: https://pandas.pydata.org/docs/user_guide/reshaping.html

#truncate to minutes
df["Time_trunc"] = df["Time"].values.astype('<M8[m]')

#Set index (in order to pivot) and pivot (unstack)
df = df.set_index(["Time_trunc",'Station_name'])
df_pivoted = df.unstack()

#Flatten multi-columns
df_new = pd.DataFrame(df_pivoted.to_records())
df_new.columns = ["Time_trunc", "Temp_station_a", "Temp_station_b", "time_station_a", "Time_station_b"]

#Add Diff of temperatures
df_new["DiffAbs"] = abs(df_new["Temp_station_a"]-df_new["Temp_station_b"])

Resulting DataFrame Image

dantzgo
  • 1
  • 1
  • As I give a hint in the second part of my answer, this kind of approach will fail in cases such as 09:43 / 10:12 – mozway Jul 09 '22 at 12:16
  • sorry @mozway I don't get what you mean. Would it fail when truncating to minutes? Why those times? – dantzgo Jul 09 '22 at 12:28
  • 1
    It's an example. In the above case, loop would want the two times (MM:SS) to match (they are within a minute), but truncating would give two different minute bins (9 min and 10min) – mozway Jul 09 '22 at 12:42