Selecting value of column based on the values in another column, then applying the value at each row in pandas dataframe

Question

I have to calculate the value of S, which has the formula as: S = (25400/CN) − 254

the CN value which I have to choose will depend on the amc_active condition viz 1, 2 and 3. if amc_active condition at 'index 0 or 1st row' is 1 then I have to choose the CN value from column cn1 i.e 47

and if the amc_active is 3, then I have to choose CN value as 95 from cn3 column in the 4th row and so on..

     cn1         cn2     cn3     amc_active
0     47         56       78         1
1     55         61       87         2
2     36         67       73         1
3     42         84       95         3

... ... ... ... ... ... ... ... ...
17410   42       84       96         3
17411   48       81       85         1
17412   55       59       82         1
17413   57       86       93         2
17414   36       87       91         2

for doing so, I am using else if condition

    if (df_col_all_merged['amc_active'] == 1):
        cn_for_s = df_col_all_merged['cn1']
    elif (df_col_all_merged['amc_active'] == 2):
        cn_for_s = df_col_all_merged['cn2']
    elif (df_col_all_merged['amc_active'] == 3):
        cn_for_s = df_col_all_merged['cn3']

but getting the error as

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-40c3b3817474> in <module>
----> 1 if (df_col_all_merged['amc_active'] == 1):
      2     cn_for_s = df_col_all_merged['cn1']
      3 elif (df_col_all_merged['amc_active'] == 2):
      4     cn_for_s = df_col_all_merged['cn2']
      5 elif (df_col_all_merged['amc_active'] == 3):

~\Anaconda3\envs\geocube\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1327 
   1328     def __nonzero__(self):
-> 1329         raise ValueError(
   1330             f"The truth value of a {type(self).__name__} is ambiguous. "
   1331             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

and from this

def select_cn(df_col_all_merged):
    result = cn_for_s
    if (df_col_all_merged['amc_active'] == 1):
        cn_for_s = df_col_all_merged['cn1']
    elif (df_col_all_merged['amc_active'] == 2):
        cn_for_s = df_col_all_merged['cn2']
    elif (df_col_all_merged['amc_active'] == 3):
        cn_for_s = df_col_all_merged['cn3']
    return result

df_col_all_merged['s_mm'] = (25400/select_cn(df_col_all_merged)) - 254

the error is

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-54-df43eddeac39> in <module>
----> 1 df_col_all_merged['s_mm'] = (num_const/select_cn(df_col_all_merged)) - dev_const

<ipython-input-51-7405a6dd24db> in select_cn(df_col_all_merged)
      1 def select_cn(df_col_all_merged):
----> 2     result = cn_for_s
      3     if (df_col_all_merged['amc_active'] == 1):
      4         cn_for_s = df_col_all_merged['cn1']
      5     elif (df_col_all_merged['amc_active'] == 2):

UnboundLocalError: local variable 'cn_for_s' referenced before assignment

How to rectify this?

Mustafa Aydın · Accepted Answer · 2021-06-14T09:15:51.360

You can use numpy's fancy indexing:

# get the values of `cn*` columns
cn_123 = df_col_all_merged[["cn1", "cn2", "cn3"]].to_numpy()

# index into it as "(row_index, amc_active_value-1)"
cn = cn_123[np.arange(len(df_col_all_merged)),
            df.amc_active-1]

# perform the formula
df_col_all_merged["s_mm"] = (25400/cn) - 254

where we index each row with the index from amc_active (but minus 1 since 0-indexed),

to get

       cn1  cn2  cn3  amc_active        s_mm
0       47   56   78           1  286.425532
1       55   61   87           2  162.393443
2       36   67   73           1  451.555556
3       42   84   95           3   13.368421
17410   42   84   96           3   10.583333
17411   48   81   85           1  275.166667
17412   55   59   82           1  207.818182
17413   57   86   93           2   41.348837
17414   36   87   91           2   37.954023

Alternatively, there is np.select in lieu of multiple if-elif's:

# form the conditions & corresponding choices
conditions = [df.amc_active.eq(1), df.amc_active.eq(2), df.amc_active.eq(3)]
choices = [df.cn1, df.cn2, df.cn3]

# select so
cn = np.select(conditions, choices)

# formula
df_col_all_merged["s_mm"] = (25400/cn) - 254

But due to convenient structure of df.amc_active this may not be the best. (also you may omit the last condition & choice and give it as the default to np.select, i.e., as an "else").

Hii Mustafa, in there is index error in 2nd line of your solution, IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices, cn = cn_123[np.arange(len(df_col_all_merged)), df_col_all_merged.amc_active-1] — Abhilash Singh Chauhan, Jun 14 '21 at 10:11
hi @AbhilashSinghChauhan, Then `amc_active` is not an integer column but strings perhaps; so please try `cn_123[np.arange(len(df_col_all_merged)), df_col_all_merged.amc_active.astype(int)-1]` — Mustafa Aydın, Jun 14 '21 at 10:15
now it is showing ValueError: Cannot convert non-finite values (NA or inf) to integer — Abhilash Singh Chauhan, Jun 14 '21 at 10:18

score 0 · Answer 2 · answered Jun 14 '21 at 09:07

0

Use DataFrame.melt for alternative for lookup:

df1 = df.melt('amc_active', ignore_index=False)

CN = df1.loc['cn' + df1['amc_active'].astype(str) == df1['variable'], 'value']
df['new'] = (25400/CN) - 254
print (df)
       cn1  cn2  cn3  amc_active         new
0       47   56   78           1  286.425532
1       55   61   87           2  162.393443
2       36   67   73           1  451.555556
3       42   84   95           3   13.368421
17410   42   84   96           3   10.583333
17411   48   81   85           1  275.166667
17412   55   59   82           1  207.818182
17413   57   86   93           2   41.348837
17414   36   87   91           2   37.954023

answered Jun 14 '21 at 09:07

jezrael

822,522
95
1,334
1,252

Hiii @jazrael, thanks for the reply, your answer seems right but , CN = df1.loc['cn' + df1['amc_active'].astype(str) == df1['variable'], 'value'] print(CN) in my case I am getting empty series, hence the value in column df['new'] is empty/np.nan – Abhilash Singh Chauhan Jun 14 '21 at 09:59
@AbhilashSinghChauhan - What is `print (df.columns.tolist())` ? – jezrael Jun 14 '21 at 10:22
The output for, print (df_col_all_merged.columns.tolist()) is [ 'amc_active', 'cn1', 'cn2', 'cn3', 'new'] – Abhilash Singh Chauhan Jun 14 '21 at 10:27
actually the 2nd line returns empty series, that's why its not generating any output in 'new' column, on printing, print(CN) the output is Series([], Name: value, dtype: object) – Abhilash Singh Chauhan Jun 14 '21 at 10:28
@AbhilashSinghChauhan - One idea. If use `df1['cols'] = 'cn' + df1['amc_active'].astype(str)` and then check `print (df1[['cols','variable']])` are values same per rows? – jezrael Jun 14 '21 at 10:30
Now i got it, actually it is adding decimal to the column that's why the value are not being matched, amc_active variable value cols 0 1.0 point_id 1 cn1.0 1 1.0 point_id 2 cn1.0 2 1.0 point_id 3 cn1.0 – Abhilash Singh Chauhan Jun 14 '21 at 10:32
output for df1['cols'] is cn1.0 and cn2.0 etc – Abhilash Singh Chauhan Jun 14 '21 at 10:33

Selecting value of column based on the values in another column, then applying the value at each row in pandas dataframe

2 Answers2