4

I'm filtering my rows from dataframe named joined on multiple criterias:

joined[(joined.gvkey_x==gvkey) & (year-3<=joined.year_y<=year )]

I want all the rows from the dataframe which satisfy the given conditions. I'm returned this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I was reading this answer which states that I should use .any()/ .all() but this isnt exactly clear.

Can someone explain to me how I should apply multiple conditions and what is exactly the purpose and use of a.any() or a.all() in my operation (&).

Community
  • 1
  • 1
user248884
  • 851
  • 1
  • 11
  • 21

2 Answers2

2

You could use query

joined.query('gvkey_x == @gvkey & @year - 3 <= year_y <= @year')

demo

year = 2002
gvkey = 12345

joined = pd.DataFrame(dict(
        gvkey_x=[12343] * 6 + [12345] * 6 + [12346] * 6,
        year_y=list(range(2000, 2006)) * 3
    ))
print(joined)

    gvkey_x  year_y
0     12343    2000
1     12343    2001
2     12343    2002
3     12343    2003
4     12343    2004
5     12343    2005
6     12345    2000
7     12345    2001
8     12345    2002
9     12345    2003
10    12345    2004
11    12345    2005
12    12346    2000
13    12346    2001
14    12346    2002
15    12346    2003
16    12346    2004
17    12346    2005

print(joined.query('gvkey_x == @gvkey & @year - 3 <= year_y <= @year'))

   gvkey_x  year_y
6    12345    2000
7    12345    2001
8    12345    2002

The reason you were getting your error is because of (year-3<=joined.year_y<=year )
pandas needs to evaluate each comparison separately.

As @Psidom suggested in the comments:

(joined.year_y >= year-3) & (joined.year_y <= year)

Would fix it.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

You can try numpy.where (using sample data):

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"year": [2013, 2014, 2014, 2015, 2013, 2013], "class": ['a', 'b', 'b', 'b', 'b', 'a'], "value": [13, 25, 32, 12, 23, 54]})
>>> df
  class  value  year
0     a     13  2013
1     b     25  2014
2     b     32  2014
3     b     12  2015
4     b     23  2013
5     a     54  2013
>>> df.ix[np.where((df['class'] == 'a') & (df['year'] == 2013))[0]]
  class  value  year
0     a     13  2013
5     a     54  2013
blacksite
  • 12,086
  • 10
  • 64
  • 109