Pandas average timestamp for DateFrame subset

Question

I am really new of Pandas and I have a problem how to calculate the average value of a set of time.

I have a csv file with columns: Date, Time, Outside temperature

I imported and modify it as:

df = pd.read_csv("./file.csv", parse_dates=[0], dayfirst=True)
df["Date"] = pd.to_datetime(df["Date"])
df["Time"] = pd.to_datetime(df["Time"]).dt.time

I prefer to have the date and time separate in two different columns and not use them as index.

I already extracted the part I need and obtaining something like this:

           Date      Time  Outside Temperature
4343 2006-06-30  13:00:00                 15.9
4344 2006-06-30  13:10:00                 15.9
4345 2006-06-30  13:20:00                 15.9
4346 2006-06-30  13:30:00                 15.9
4347 2006-06-30  13:40:00                 15.9

as you can see at same temperature I have different time, I would like to have the average value of the time, something like: 13:22:34

How can I do it?

I checked other questions as: Average time for datetime list, I tried several way to access to the time, for example:

print(tempdf["Time"].dt.hour)

but I obtain the error:

AttributeError: Can only use .dt accessor with datetimelike values

I think I make a mistake in the conversion to timestamp.

Do you have any suggestion?

I am using python3.5 and pandas 0.20.2

Thanks a lot

Ciccio

Edit:

The original csv file for the time has the format hh:mm without the seconds.

jezrael · Accepted Answer · 2017-07-07T07:38:16.977

2

I think you can use timedelatas by to_timedelta, then convert to ns, get mean and convert back:

df = pd.read_csv("./file.csv", parse_dates=[0], dayfirst=True)

a = pd.to_timedelta(df["Time"] + ':00').mean()
print (a)
0 days 13:20:00

a = pd.to_timedelta(pd.to_timedelta(df["Time"] + ':00').values.astype(np.int64).mean())
print (a)
0 days 13:20:00

If need average time for each unique dates:

df['td']= pd.to_timedelta(df["Time"] + ':00').values.astype(np.int64)
df1 = pd.to_timedelta(df.groupby('Date')['td'].mean()).reset_index()
print (df1)
        Date       td
0 2006-06-30 13:20:00

... or for unique temperature:

df['td']= pd.to_timedelta(df["Time"] + ':00').values.astype(np.int64)
df1 = pd.to_timedelta(df.groupby('Outside Temperature')['td'].mean()).reset_index()
print (df1)
   Outside Temperature       td
0                 15.9 13:20:00

...or mean of temperature and times:

df['Time']= pd.to_timedelta(df["Time"]).values.astype(np.int64)
df1 = df.groupby('Date', as_index=False).mean()
df1['Time']= pd.to_timedelta(df1["Time"])
print (df1)
        Date     Time  Outside Temperature
0 2006-06-30 13:20:00                 15.9

edited Jul 07 '17 at 07:38

answered Jul 07 '17 at 07:10

jezrael

822,522
95
1,334
1,252

thanks for the answer but I obtain this error: TypeError: object of type 'datetime.time' has no len() During handling of the above exception, another exception occurred: ValueError: Invalid type for timedelta scalar: – cicciodevoto Jul 07 '17 at 07:14
You cannot convert to times not to datetimes, because column date is converted in read_csv by `parse_date` parameter. And `type` of `time` column need string. I add to answer `read_csv`. Need remove `df["Date"] = pd.to_datetime(df["Date"]) df["Time"] = pd.to_datetime(df["Time"]).dt.time` – jezrael Jul 07 '17 at 07:22
Yes sorry I saw later your other answer. The problem is the csv file doesn't have seconds is in the format: hh:mm, and when I run it wants the format hh:mm:ss Is there a way to avoid it? – cicciodevoto Jul 07 '17 at 07:31
Then need `df['td']= pd.to_timedelta(df["Time"] + ':00').values.astype(np.int64)` – jezrael Jul 07 '17 at 07:33
1

Thanks a lot! Your answer was perfect! – cicciodevoto Jul 07 '17 at 09:48

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

0

To take mean of time you convert time series to timedelta type and take mean. To get average of time and temperature for a perticular date use groupby

edit

If Time column only contains hh:mm you need to add secs in it

df['Time'] = pd.to_timedelta(df["Time"] + ':00')
print(df)

output:

         Date     Time  Outside Temperature
0  2006-06-30 13:00:00                 15.9
1  2006-06-30 13:10:00                 15.9
2  2006-06-30 13:20:00                 15.9
3  2006-06-30 13:30:00                 15.9
4  2006-06-30 13:40:00                 15.9

Convert Time to int so that it can be used in groupby

df['Time'] = df['Time'].astype(int)

Group by date column and get mean of time and Outside Temperature

df = df.groupby(['Date'])['Time', 'Outside Temperature'].mean()

Now again convert time series to Timedelta type

df['Time'] = pd.to_timedelta(df['Time'])

Output:

               Time  Outside Temperature
Date                                    
2006-06-30 13:20:00                 15.9

edited Jun 20 '20 at 09:12

Community

1
1

answered Jul 07 '17 at 07:22

Akshay Kandul

592
4
10

What pandas version do you use? – jezrael Jul 07 '17 at 07:29
@jezrael its 0.19.1 and python version 3.4.3. Installing newest version 0.20.1. – Akshay Kandul Jul 07 '17 at 07:31
for me in `0.20.2` it does not work...I use windows and you? – jezrael Jul 07 '17 at 07:42
@jezrael, just updated my pandas package to 0.20.2 it still works on that. Btw am using ubuntu. – Akshay Kandul Jul 07 '17 at 08:40

score 0 · Answer 3 · answered Jun 29 '20 at 13:15

Preparing a sample dataframe (represents your subset):

# Initiate dataframe
date_var = "date"
df = pd.DataFrame(data=[['A', '2018-08-05 17:06:01'],
                        ['A', '2018-08-05 17:06:02'],
                        ['A', '2018-08-05 17:06:03'],
                        ['B', '2018-08-05 17:06:07'],
                        ['B', '2018-08-05 17:06:09'],
                        ['B', '2018-08-05 17:06:11']],
                  columns=['column', date_var])

# Convert date-column to proper pandas Datetime-values/pd.Timestamps
df[date_var] = pd.to_datetime(df[date_var])

Extraction of the desired average Timestamp-value:

# Extract the numeric value associated to each timestamp (epoch time)
# NOTE: this is being accomplished via accessing the .value - attribute of each Timestamp in the column
In:
[tsp.value for tsp in df[date_var]]
Out:
[
    1533488761000000000, 1533488762000000000, 1533488763000000000,
    1533488767000000000, 1533488769000000000, 1533488771000000000
]

# Use this to calculate the mean, then convert the result back to a timestamp
In:
pd.Timestamp(np.nanmean([tsp.value for tsp in df[date_var]]))
Out:
Timestamp('2018-08-05 17:06:05.500000')

Pandas average timestamp for DateFrame subset

3 Answers3

edit