0

i got dataframe with column like this:

Date
3 mins
2 hours
9-Feb
13-Feb

the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?

  • Your data is missing the exact day, month, year and time information. Without this it is not possible to convert into unix time format – Karthick Mohanraj Feb 26 '20 at 11:29
  • hey, @KarthickMohanraj . What is the exact string type that i need so i can convert it into unix ? what if i replace -Feb with /02/2020 ? if 9/02/2020 is enough to convert, it's fine for me. I need to get the day in timestamp, not the exact time till seconds/minutes. – Alex Kalaidjiev Feb 26 '20 at 11:36
  • The Unix time is the time elapsed since Jan 1, 1970 represented in seconds. For example, If you want to calculate the Unix time at ```5.00 am on 5th of Feb 2019```, you will be able to calculate it since you have the exact time (5.00 am), date (5th), month (Feb) and year (2019) information for which you want the unix time. But in your case, you do not have these specifics available in your dataframe – Karthick Mohanraj Feb 26 '20 at 11:43

1 Answers1

1

One idea is convert columns to datetimes and to timedeltas:

df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')

times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
                             '\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')

#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
      Date      dates    times                  all
0   3 mins        NaT 00:03:00         180000000000
1  2 hours        NaT 02:00:00        7200000000000
2    9-Feb 2020-02-09      NaT  1581206400000000000
3   13-Feb 2020-02-13      NaT  1581552000000000000
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • hey @jezrael . This looks really amazing, thank you ! Got only one question : 180000000000 in normal date is Monday, September 15, 1975 which i think is not before 3 minutes :D – Alex Kalaidjiev Feb 26 '20 at 12:38
  • @AlexKalaidjiev - I add year `2020` for datetimes, if want `1975` change `df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')` to `df['dates'] = pd.to_datetime(df['Date']+'-1975', format='%d-%b-%Y', errors='coerce')` – jezrael Feb 26 '20 at 12:39
  • Tried both of the ways now, but i'm still getting NaT or NaN at all three new rows. Also the last line is making error : np is not defined – Alex Kalaidjiev Feb 26 '20 at 15:02
  • @AlexKalaidjiev - I think problem should be with format of data, data are exactly like in question - day-first 3letters of months or number with `mins` or `hours` strings? Because for each different formats solution fail. – jezrael Feb 26 '20 at 15:10