0

I have two datetime ranges in pandas dataframe. The resolution is one minute e.g of datetime 2021-03-26 10:23:00.

The dataframe A has the correct datetimes but the dataframe B is missing some datetimes. Is there any way to find which datetimes are missing from dataframe B and create a new dataframe C with the observations of dataframe A but with the datetimes of dataframe B?

Dataframe B

Dates Car
2021-03-26 10:23:00 1
2021-03-26 10:24:00 0
2021-03-26 10:26:00 1
2021-03-26 10:29:00 0

Dataframe A

Dates
2021-03-26 10:23:00
2021-03-26 10:24:00
2021-03-26 10:25:00
2021-03-26 10:26:00
Giorgos
  • 35
  • 2
  • 9

2 Answers2

0

If there are DatetimeIndex in both DataFrames use:

A.index = pd.to_datetime(A.index)
B.index = pd.to_datetime(B.index)

C = A[~A.index.isin(B.index)]

If there are dates columns in both DataFrames use:

A['Dates']= pd.to_datetime(A['Dates'])
B['Dates']= pd.to_datetime(B['Dates'])

C = A[~A['Dates'].isin(B['Dates'])]

print (C)
                Dates  Car
3 2021-03-26 10:29:00    0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • it returns an empty dataframe but there are approximately 122k same dates. – Giorgos Jul 08 '21 at 13:19
  • @Giorgos - Which solution was used? – jezrael Jul 08 '21 at 13:25
  • @Giorgos - Are datetimes in both? – jezrael Jul 08 '21 at 13:26
  • I used the second solution. – Giorgos Jul 08 '21 at 13:28
  • Yes both are datetimes – Giorgos Jul 08 '21 at 13:29
  • @Giorgos - So if try `to_datetime` before still not working? Like in changed answer. – jezrael Jul 08 '21 at 13:30
  • Yes, still not working. needed to to pd.to_datetime(B.stack()).unstack()) to overcome an error. – Giorgos Jul 08 '21 at 13:43
  • @Giorgos - Can you add some small data sample to question? there is only one column `date` or multiple columns filled by dates? – jezrael Jul 08 '21 at 13:45
  • I add a small sample of both dataframes. The dataframe B consist of 5 columns but only one column is datetime. Dataframe A consist of just 1 column, the datetime column. – Giorgos Jul 08 '21 at 14:33
  • @Giorgos - Testing and working well with sample data, added output to answer. If not working is possible some another format of data? Why is necessary use `pd.to_datetime(B.stack()).unstack())` here? – jezrael Jul 09 '21 at 05:21
  • Because I am getting ''ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing'' with just pd.to_datetime(B). The format is datetime64 checked with B.dtypes. – Giorgos Jul 09 '21 at 05:26
  • @Giorgos - Understand. Still not working? Datetimes are converted correctly? – jezrael Jul 09 '21 at 05:27
  • No its not working, it returns a dataframe now with approx. 40k observations while the different datimes are approx. 14k. – Giorgos Jul 09 '21 at 08:14
  • @Giorgos - hmm, so data related problem. Hard to know what is problem here, with small data sample working solution very well – jezrael Jul 09 '21 at 08:16
  • find out that pandas confuses days<13 with months. I try to solve it with dayfirst=True argument and several other methods I found in the forum like this (https://stackoverflow.com/questions/50367656/python-pandas-pandas-to-datetime-is-switching-day-month-when-day-is-less-t/50372326) but still same. – Giorgos Jul 12 '21 at 08:48
  • @Giorgos - So it means there is part of datetimes swapped? And part not? If yes, it is impossible solve. – jezrael Jul 12 '21 at 08:52
  • @Giorgos - Because pandas not know if `02-05-2010` is May or February. – jezrael Jul 12 '21 at 08:52
  • Basically yes, pandas confuses days<13 with months. – Giorgos Jul 14 '21 at 07:16
0
C = A[~A['date'].isin(B['date'])]

You can use it if there are dates columns in each Dataframe

Hamza
  • 530
  • 5
  • 27