0

Goal: I read measurement data from a .csv and convert them to a dataframe. Then I add the date information from the filename to the time string which is already in the dataframe. And the last step is to convert this string with date and time informatin into a datetime object.

First steps that worked:

import pandas as pd
filename = '2022_02_14_data_0.csv
path = 'C:/Users/ma1075116/switchdrive/100_Schaltag/100_Digitales_Model/Messungen/'
measData = pd.read_csv(path+filename, sep = '\t', header = [0,1], encoding = 'ISO-8859-1')
# add the date to the timestamp string
measData['Timestamp'] = filename[:11]+measData['Timestamp']

An object in the Dataframe measData['Timestamp'] has now exacty a string with the following pattern:

'2022_02_14_00:00:06'

Now I want to convert this string to datetime:

measData['Timestamp'] = pd.to_datetime(measData['Timestamp'], format= '%Y_%m_%d_%H:%M:%S')

This raises the error:

ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing

Why do I get this error and how can I avoid it? I am pretty shure that the format is correct.

Edit: I wrote a sample code which should do exactly the same, and it works:

filename = '2022_02_14_data_0.csv'
timestamps = {'Timestamp': ['00:00:00', '00:00:01', '00:00:04']}
testFrame = pd.DataFrame(timestamps)
testFrame['Timestamp'] = testFrame['Timestamp']#
testFrame['Timestamp'] = filename[:11]+testFrame['Timestamp']
testFrame['Timestamp'] = pd.to_datetime(testFrame['Timestamp'], format= '%Y_%m_%d_%X') 

My next step is now to check if all timestamp entries in the dataframe have the same format.

Solution: I do not understand the error but I found a working solution. Now I parse for the time already in the read_csv function and add the date information from the filename there. This works, measData(timeStamp) has now the datatype datetime64.

filename = '2022_02_14_data_0.csv'
path = 'C:/Users/ma1075116/switchdrive/100_Schaltag/100_Digitales_Model/Messungen/'
measData = pd.read_csv(path+filename, sep = '\t', header = [0,1], 
                       parse_dates=[0], # parse for the time in the first column
                       date_parser = lambda col: pd.to_datetime(filename[:11]+col, format= '%Y_%m_%d_%X'),
                       encoding = 'ISO-8859-1')
nistras
  • 1
  • 2
  • Hello, maybe you just lack an _ in your code, judging by the format of your string : `measData['Timestamp'] = pd.to_datetime(measData['Timestamp'], format= '%Y_%m_%d_%H:%M:%S')` – LittleSoap Mar 10 '22 at 14:27
  • Avoid custom format as much as you can. Opt in for reputed standard such as ISO-8601. – jlandercy Mar 10 '22 at 14:51

2 Answers2

0

Your format seems to be missing an underscore after day.

This works for me:

import pandas as pd
date_str = '2022_02_14_00:00:06'
pd.to_datetime(date_str, format= '%Y_%m_%d_%H:%M:%S')

EDIT:

This works fine for me (measData["Timestamp"] is a pd.Series):

import pandas as pd

measData = pd.DataFrame({"Timestamp": ['2022_02_14_00:00:06', '2022_02_14_13:55:06', '2022_02_14_12:00:06']})
pd.to_datetime(measData["Timestamp"], format= '%Y_%m_%d_%H:%M:%S')

The only way I found to reproduce your error is this (measData is a pd.DataFrame):

import pandas as pd

measData = pd.DataFrame({"Timestamp": ['2022_02_14_00:00:06', '2022_02_14_13:55:06', '2022_02_14_12:00:06']})
pd.to_datetime(measData, format= '%Y_%m_%d_%H:%M:%S')

So make sure that what you are putting into to_datetime is a pd.Series. If this does not help, please provide a small sample of your data.

LBrummer
  • 1
  • 2
  • It also does not work with the missing underscore. It raises the same error. I edited the original post and added the missing underscore. – nistras Mar 10 '22 at 14:46
0

You can do it like this using datetime.datetime.strptime and apply in the column.

Recreating your dataset:

    import datetime
    import pandas as pd

    data={'2016_03_29_08:15:27', '2017_03_29_08:18:27', 
    '2018_06_30_08:15:27','2019_07_29_08:15:27'}
     columns={'time'}

    df= pd.DataFrame(data=data, columns=columns)

Applying the desired transformation:

   df['time'] = df.apply(lambda row : datetime.datetime.strptime(row['time'], 
   '%Y_%m_%d_%H:%M:%S'), axis=1)