0

I am looking for adding a 1 in an InRange cell if any date in ipu is between the date range in Date_range and a 0 if not.

List of tuples:

ipu = [('2016-05-31', '5,000'),('2015-03-06', '2,920')]

df:

          Date                Date_range  InRange
0   2016-07-26  [2016-07-26, 2016-04-26]      NaN
1   2016-04-26  [2016-04-26, 2016-01-26]      NaN
2   2016-01-26  [2016-01-26, 2015-10-27]      NaN
3   2015-10-27  [2015-10-27, 2015-07-21]      NaN
4   2015-07-21  [2015-07-21, 2015-04-27]      NaN
5   2015-04-27  [2015-04-27, 2015-01-27]      NaN
6   2015-01-27  [2015-01-27, 2014-10-20]      NaN
7   2014-10-20  [2014-10-20, 2014-07-22]      NaN
8   2014-07-22  [2014-07-22, 2014-04-23]      NaN
9   2014-04-23  [2014-04-23, 2014-01-27]      NaN
10  2014-01-27  [2014-01-27, 2013-10-28]      NaN
11  2013-10-28  [2013-10-28, 2013-07-23]      NaN
12  2013-07-23                       NaN      NaN

Desired output:

          Date                Date_range  InRange
0   2016-07-26  [2016-07-26, 2016-04-26]      1
1   2016-04-26  [2016-04-26, 2016-01-26]      0
2   2016-01-26  [2016-01-26, 2015-10-27]      0
3   2015-10-27  [2015-10-27, 2015-07-21]      0
4   2015-07-21  [2015-07-21, 2015-04-27]      0
5   2015-04-27  [2015-04-27, 2015-01-27]      1
6   2015-01-27  [2015-01-27, 2014-10-20]      0
7   2014-10-20  [2014-10-20, 2014-07-22]      0
8   2014-07-22  [2014-07-22, 2014-04-23]      0
9   2014-04-23  [2014-04-23, 2014-01-27]      0
10  2014-01-27  [2014-01-27, 2013-10-28]      0
11  2013-10-28  [2013-10-28, 2013-07-23]      0
12  2013-07-23                       NaN      0

This is the code that I tried:

for i in range(len(df.index)-1):
    for r in range(len(ipu)):
        df.loc[(df['Date_range'][i][0] > ipu[r][0]) & (df['Date_range'][i][1] <= ipu[r]), 'InRange'[i]] = 1

I am getting:

KeyError: 'cannot use a single bool to index into setitem'

I looked at

KeyError when using boolean filter on pandas data frame

but was not clear how to apply it to my case.

I know I am halfway with my code but I am stuck ...I appreciate some orientation.

Thanks

Diego
  • 637
  • 3
  • 10
  • 24
  • What is `i` in your loop? – IanS Oct 09 '17 at 14:45
  • @IanS, I added the for loop line using i. – Diego Oct 09 '17 at 14:56
  • At a basic level, you can only index a dataframe with a boolean vector of the same length. Here, your expression returns a single boolean value (`True` or `False`) so indexing fails. – IanS Oct 09 '17 at 14:58
  • Thank you @IanS. What type of solution should I look for? – Diego Oct 09 '17 at 15:07
  • @IanS I tried using the date ranges as a list and applying a for loop together with a for loop for the date tuples but was not succesfull either (and think it is very inefficient computationally ) – Diego Oct 09 '17 at 15:13
  • @IanS, this is the post for that question: https://stackoverflow.com/questions/46637111/dates-in-list-of-tuples-between-dates-in-list-of-lists-to-a-dictionary – Diego Oct 09 '17 at 15:14

1 Answers1

1

Its generally a bad idea to have lists as values in a column. Apply pd.Series to the Date_range to make a proper dataframe and then use that result to check against each of the dates in ipu.

df_dates = df['Date_range'].apply(lambda x: pd.Series(x, index=['end', 'start']))

          end      start
0  2016-07-26 2016-04-26
1  2016-04-26 2016-01-26
2  2016-01-26 2015-10-27
3  2015-10-27 2015-07-21
4  2015-07-21 2015-04-27
5  2015-04-27 2015-01-27
6  2015-01-27 2014-10-20
7  2014-10-20 2014-07-22
8  2014-07-22 2014-04-23
9  2014-04-23 2014-01-27
10 2014-01-27 2013-10-28
11 2013-10-28 2013-07-23
12        NaT        NaT

dates = pd.to_datetime([d[0] for d in ipu])
sum([(d >= df_dates.start) & (d <= df_dates.end) for d in dates]).gt(0).astype(int)

0     1
1     0
2     0
3     0
4     0
5     1
6     0
7     0
8     0
9     0
10    0
11    0
12    0
dtype: int64
Ted Petrou
  • 59,042
  • 19
  • 131
  • 136