1

I have 3 different datasets that I'm reading in with pd.read_csv. One of the columns of the data is time in seconds and I would like to use a function that I made for the pd.read_csv date_parser argument. It works fine when all the data is integers. However, the function I made doesn't work when I have a string or float. I think the problem is occuring at datetime.datetime.fromtimestamp(float(time_in_secs) part of my function. Does anybody know how I can get this to work for all my datasets. I'm completely stuck. I put a sample below of how the 3 different datasets look.

dataset 1

555, 1404803485, 800

555, 1408906759, 900

dataset 2

231, 1404803485, pass

231, 1404803490, fail

dataset 3

16010925, 1403890894, 40.5819880696

16010925, 1903929273, 40.5819880696

def dateparse(time_in_secs):

if isinstance(time_in_secs, str):
    if time_in_secs == '\\N':
        time_in_secs = 0

tm = datetime.datetime.fromtimestamp(float(time_in_secs))
tm = tm - datetime.timedelta(
    minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
return tm


pd.read_csv('dataset_here.csv',
           delimiter=',', index_col=[0,1], parse_dates=['Timestamp'], 
                date_parser=dateparse, names=['Serial', 'Timestamp', 'result'])
Community
  • 1
  • 1
zipline86
  • 561
  • 2
  • 7
  • 21

1 Answers1

2

I believe need convert time to 0 for all strings, for floats your solution working nice:

def dateparse(time_in_secs):

    if isinstance(time_in_secs, str):
        #https://stackoverflow.com/a/45372194
        #time_in_secs = 86400
        time_in_secs = 0

    #print (time_in_secs)
    tm = datetime.datetime.fromtimestamp(float(time_in_secs))
    tm = tm - datetime.timedelta(
    minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
    return tm

More general solution - try convert value to float and if not possible assign default value:

def dateparse(time_in_secs):

    if isinstance(time_in_secs, str):
        try:
            time_in_secs = float(time_in_secs)
        except ValueError:
            #https://stackoverflow.com/a/45372194
            #time_in_secs = 86400
            time_in_secs = 0

    #print (time_in_secs)
    tm = datetime.datetime.fromtimestamp(float(time_in_secs))
    tm = tm - datetime.timedelta(
    minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
    return tm

Sample: Tested under windows:

import pandas as pd
import datetime

def dateparse(time_in_secs):

    if isinstance(time_in_secs, str):
        try:
            time_in_secs = float(time_in_secs)
        except ValueError:
            #https://stackoverflow.com/a/45372194
            #time_in_secs = 0
            time_in_secs = 86400

    print (time_in_secs)
    tm = datetime.datetime.fromtimestamp(float(time_in_secs))
    tm = tm - datetime.timedelta(
    minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
    return tm

temp=u"""16010925,test,40.5819880696
16010925,1903929273,40.5819880696"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), index_col=[0,1], parse_dates=['Timestamp'], 
                date_parser=dateparse, names=['Serial', 'Timestamp', 'result'])

print (df)
                                 result
Serial   Timestamp                     
16010925 1970-01-02 01:00:00  40.581988
         2030-05-02 07:10:00  40.581988

print (df.index.get_level_values(1))
DatetimeIndex(['1970-01-02 01:00:00', '2030-05-02 07:10:00'], 
              dtype='datetime64[ns]', name='Timestamp', freq=None)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252