1

Here is the data

ID          VAR1            VAR2            VAR3
1           [12, 'a', 'ok'] [4, 'b', 'duk'] NaN
2           NaN             NaN             NaN
3           [1, 'f', 'sd']  NaN             [34, 'daa']

I want to create a new variable called MIN_VALUE that compares all three variables' first list items, and extract the lowest value. This will give the following

ID          VAR1            VAR2            VAR3            MIN_VALUE
1           [12, 'a', 'ok'] [4, 'b', 'duk'] NaN             4
2           NaN             NaN             NaN             NaN
3           [1, 'f', 'sd']  NaN             [34, 'daa']     1

I tried to create and apply a function as below, and I want it to be flexible with the number of variables to be selected (hence using *args). But it doesn't work correctly

def extract_min_value_from_first_list_item_across_multiple_columns(df, *args):
    return min(df[args][0])

df['MIN_VALUE'] = df.apply(
    extract_min_value_from_first_list_item_across_multiple_columns, 'VAR1', 'VAR2', 'VAR3', axis=1)

Resulting error as TypeError: apply() got multiple values for argument 'axis'.

KubiK888
  • 4,377
  • 14
  • 61
  • 115

1 Answers1

1
df["MIN_VALUE"] = df.loc[:, "VAR1":].apply(
    lambda x: min((v[0] for v in x[x.notna()]), default=np.nan), axis=1
)
print(df)

Prints:

   ID         VAR1         VAR2       VAR3  MIN_VALUE
0   1  [12, a, ok]  [4, b, duk]        NaN        4.0
1   2          NaN          NaN        NaN        NaN
2   3   [1, f, sd]          NaN  [34, daa]        1.0
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91