13

I have a DataFrame which has an open time and a close time and I am trying to calculate the difference in milliseconds.

My code is currently like this

df = df.assign(Latency=lambda d: d.CloseTimeStamp - d.CreationTimeStamp)
df.Latency = df.apply(lambda d: d.Latency.total_seconds() * 1000., axis=1)

However, I'd like to know why I can't do as a one-liner like so

df = df.assign(Latency=lambda d: (d.CloseTimeStamp - d.CreationTimeStamp).total_seconds() * 1000.)

When I try the latter I get AttributeError: 'Series' object has no attribute 'total_seconds'

aydow
  • 3,673
  • 2
  • 23
  • 40

2 Answers2

14

Total seconds is inside the .dt attribute, so this should work:

df = df.assign(Latency=lambda d: (d.CloseTimeStamp - d.CreationTimeStamp).dt.total_seconds() * 1000.)

Having said so, there's no need for a lambda function:

df = df.assign(Latency=(df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.)

is much faster.

A further remark on efficiency: df.assign() builds a completely new dataframe object; if you're intending to assign this object back onto df, you're better off modifying df in-place:

df['Latency'] = (df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.
Ken Wei
  • 3,020
  • 1
  • 10
  • 30
2

Need .dt accessor, because working with datetime Series, .dt is omit if DatetimeIndex:

df = df.assign(Latency=lambda d: (d.CloseTimeStamp -  d.CreationTimeStamp).dt.total_seconds() * 1000.)

Solution without lambda:

df = df.assign(Latency=(df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.)

...and solution without assign:

df['Latency'] = (df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252