1

I have a dataframe below

df=pd.DataFrame(np.random.randn(6,3),index=list("ABCDEF"),columns=list("XYZ"))
df.reset_index(inplace=True)
df

I want to have a new column named "Q". The values under column "Q" shall be calculated based on the labels under index column with the following three conditions:

conditions=[(df["index"]== "A"|"B"|"C"|"D"),(df["index"]== "E"),(df["index"]== "F")]
returned_value=[df["X"]+df["Y"],df["Y"]*2,df["Z"]]

So I was thinking using

df["Q"]=np.select(conditions, returned_value)

I however got the error after defining the conditions. I first used or, and got another error, and then changed to |, but got the following. Any hints on how can I achieve what I want?

TypeError: unsupported operand type(s) for |: 'str' and 'str'
xiaoshir
  • 215
  • 4
  • 17

1 Answers1

1

Use isin for check membership of multiple values:

np.random.seed(1213)
df=pd.DataFrame(np.random.randn(6,3),index=list("ABCDEF"),columns=list("XYZ"))
df.reset_index(inplace=True)

conditions=[df["index"].isin(["A","B","C","D"]),(df["index"]== "E"),(df["index"]== "F")]
returned_value=[df["X"]+df["Y"],df["Y"]*2,df["Z"]]
df["Q"]=np.select(conditions, returned_value)
print (df)
  index         X         Y         Z         Q
0     A  0.511604 -0.217660 -0.521060  0.293943
1     B  1.253270  1.104554 -0.770309  2.357825
2     C  0.632975 -1.322322 -0.936332 -0.689347
3     D  0.436361  1.233744  0.527565  1.670105
4     E -0.369576  1.820059 -1.373630  3.640118
5     F -0.414554 -0.098443  0.904791  0.904791

But reset index is not necessary, then check df.index:

np.random.seed(1213)
df=pd.DataFrame(np.random.randn(6,3),index=list("ABCDEF"),columns=list("XYZ"))

conditions=[df.index.isin(["A","B","C","D"]),(df.index == "E"),(df.index== "F")]
returned_value=[df["X"]+df["Y"],df["Y"]*2,df["Z"]]
df["Q"]=np.select(conditions, returned_value)
print (df)
          X         Y         Z         Q
A  0.511604 -0.217660 -0.521060  0.293943
B  1.253270  1.104554 -0.770309  2.357825
C  0.632975 -1.322322 -0.936332 -0.689347
D  0.436361  1.233744  0.527565  1.670105
E -0.369576  1.820059 -1.373630  3.640118
F -0.414554 -0.098443  0.904791  0.904791
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for the quick answer. Actually I have simplified the question a bit: I have a multiindex df, so the index column is actually on the 2nd level of my index. How can I specify that? – xiaoshir Apr 03 '18 at 09:57
  • then use `df.index.get_level_values(1)`, check [`get_level_values`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_level_values.html) – jezrael Apr 03 '18 at 09:58
  • 1
    thank you so much. It worked. Sorry about asking very basic questions, but I think I have to, because even if I read all the basic introductions about pd, I still run into specific questions that I can't resolve or can't recall all the details I read. It is in this way: run into specific questions-> ask them here-> got your kind answers that I think I really learn and digest... Anyway, thank you very much, as always OTL. – xiaoshir Apr 04 '18 at 08:58
  • @edge27 - You are welcome! Be free upvote also answer :) – jezrael Apr 04 '18 at 09:00