26

Is there any way to check for missing dates in a dataframe directly. I want to check if there are a missing dates between 2013-01-19 to 2018-01-29

            GWA_BTC      GWA_ETH    GWA_LTC  GWA_XLM  GWA_XRP
   Date                 
2013-01-19  11,826.36   1,068.45    195.00    0.51    1.82
2013-01-20  13,062.68   1,158.71    207.58    0.52    1.75
   ...
2018-01-28  12,326.23   1,108.90    197.36    0.48    1.55
2018-01-29  11,397.52   1,038.21    184.92    0.47    1.43

I tried to check it manually but it took a lot of time.

denis
  • 21,378
  • 10
  • 65
  • 88
Jeeth
  • 2,226
  • 5
  • 24
  • 60

5 Answers5

81

You can use DatetimeIndex.difference(other)

pd.date_range(start = '2013-01-19', end = '2018-01-29' ).difference(df.index)

It returns the elements not present in the other

Vaishali
  • 37,545
  • 5
  • 58
  • 86
4

Example:

As a minimal example, take this:

>>> df
              GWA_BTC   GWA_ETH  GWA_LTC  GWA_XLM  GWA_XRP
Date                                                      
2013-01-19  11,826.36  1,068.45   195.00     0.51     1.82
2013-01-20  13,062.68  1,158.71   207.58     0.52     1.75
2013-01-28  12,326.23  1,108.90   197.36     0.48     1.55
2013-01-29  11,397.52  1,038.21   184.92     0.47     1.43

And we can find the missing dates between 2013-01-19 and 2013-01-29

Method 1:

See @Vaishali's answer

Use .difference to find the difference between your datetime index and the set of all dates within your range:

pd.date_range('2013-01-19', '2013-01-29').difference(df.index)

Which returns:

DatetimeIndex(['2013-01-21', '2013-01-22', '2013-01-23', '2013-01-24',
               '2013-01-25', '2013-01-26', '2013-01-27'],
              dtype='datetime64[ns]', freq=None)

Method 2:

You can re-index your dataframe using all dates within your desired daterange, and find where reindex has inserted NaNs.

And to find missing dates between 2013-01-19 and 2013-01-29:

>>> df.reindex(pd.date_range('2013-01-19', '2013-01-29')).isnull().all(1)

2013-01-19    False
2013-01-20    False
2013-01-21     True
2013-01-22     True
2013-01-23     True
2013-01-24     True
2013-01-25     True
2013-01-26     True
2013-01-27     True
2013-01-28    False
2013-01-29    False
Freq: D, dtype: bool

Those values with True are the missing dates in your original dataframe

sacuL
  • 49,704
  • 8
  • 81
  • 106
  • Do i have to convert the index to datetime object? at the moment the dtype of index is object. – Jeeth Aug 27 '18 at 17:42
  • Yes, you should. `df.index = pd.to_datetime(df.index)` – sacuL Aug 27 '18 at 17:43
  • Method 1 does not work? It returns all the dates between start and end rather than displaying missing dates? – Jeeth Aug 27 '18 at 18:39
  • No, it works, but I crossed it out because @Vaishali had already answered it using that solution – sacuL Aug 27 '18 at 18:40
  • That replaces the original column(s) with True/False values. How can I keep the original columns and add the True/False as a new column? – Vega Jul 15 '20 at 09:28
2

assuming data is daily non business dates:

df.index.to_series().diff().dt.days > 1
Yuca
  • 6,010
  • 3
  • 22
  • 42
1

You can use DatetimeIndex.difference and add freq param, so you can check for missing days, hours, minutes, depending on the frequency you are using:

pd.date_range(df.index.min(), df.index.max(), freq="1min").difference(df.index)
Paul
  • 181
  • 4
  • 11
0

I can't post a comment but you can probably traverse each value and add 24 hours to the previous value to see if the date matches?

import pandas as pd

a = [1,2,3,4,5]
b = [1,0.4,0.3,0.5,0.2]

df = pd.DataFrame({'a':a , 'b': b})

for i in range(len(df)):
    prev = df.loc[i,'a']
    if i is 0:
        continue
    else:
         # Add 1 day to the current value and check with prev value
static const
  • 953
  • 4
  • 16