0

I have the following df1:

                      Value
2020-06-02 00:00:00   17
2020-06-03 00:00:00   10
2020-06-05 00:00:00   86

and the following df2:

                      Value
2020-06-01 00:00:00   16
2020-06-04 00:00:00   9
2020-06-05 00:00:00   86

and I know that the two dataframes have different length. How can I find the day of the first and the last couple of indices where df1.index == df2.index?

In my example, the day of the first matching couple would be 2020-06-05 00:00:00.

Zizzipupp
  • 1,301
  • 1
  • 11
  • 27

1 Answers1

3

Use Index.intersection and then use minimal and maximal values:

idx = df1.index.intersection(df2.index)
min1 = idx.min()
max1 = idx.max()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Uhm I noticed that this is fairly slow given two dataframes of less than 100,000 rows. Is there anything I can do to increase the speed? – Zizzipupp Jul 10 '20 at 10:29
  • @Zizzipupp - Are index sorted? – jezrael Jul 10 '20 at 10:30
  • 1
    @Zizzipupp - Is possible use [this](https://numpy.org/doc/stable/reference/generated/numpy.intersect1d.html) ? like `idx = numpy.intersect1d(df1.index, df2.index)` ? – jezrael Jul 10 '20 at 10:32
  • Yes, but it requires to turn `min1` and `max1` into strings before using them as slicers in `df1.loc[min1:max1]`. If you don't do this, you may get `KeyError: "Cannot get left slice bound for non-unique label:...` – Zizzipupp Jul 10 '20 at 10:47
  • 1
    @Zizzipupp - Last should be used `min1 = pd.Timestamp(idx.min())` ? – jezrael Jul 10 '20 at 10:48
  • 1
    Correct. The exact sequence is `str(pd.Timestamp(idx.min()))`. – Zizzipupp Jul 10 '20 at 10:50