How to compare values in the first-positioned list item across multiple columns in Python Pandas?

Question

Here is the data

ID          VAR1            VAR2            VAR3
1           [12, 'a', 'ok'] [4, 'b', 'duk'] NaN
2           NaN             NaN             NaN
3           [1, 'f', 'sd']  NaN             [34, 'daa']

I want to create a new variable called MIN_VALUE that compares all three variables' first list items, and extract the lowest value. This will give the following

ID          VAR1            VAR2            VAR3            MIN_VALUE
1           [12, 'a', 'ok'] [4, 'b', 'duk'] NaN             4
2           NaN             NaN             NaN             NaN
3           [1, 'f', 'sd']  NaN             [34, 'daa']     1

I tried to create and apply a function as below, and I want it to be flexible with the number of variables to be selected (hence using *args). But it doesn't work correctly

def extract_min_value_from_first_list_item_across_multiple_columns(df, *args):
    return min(df[args][0])

df['MIN_VALUE'] = df.apply(
    extract_min_value_from_first_list_item_across_multiple_columns, 'VAR1', 'VAR2', 'VAR3', axis=1)

Resulting error as TypeError: apply() got multiple values for argument 'axis'.

please post `data.to_dict()` in the question body, its hard replicating lists — anky, Apr 23 '21 at 16:27

score 1 · Accepted Answer · answered Apr 23 '21 at 16:50

df["MIN_VALUE"] = df.loc[:, "VAR1":].apply(
    lambda x: min((v[0] for v in x[x.notna()]), default=np.nan), axis=1
)
print(df)

Prints:

   ID         VAR1         VAR2       VAR3  MIN_VALUE
0   1  [12, a, ok]  [4, b, duk]        NaN        4.0
1   2          NaN          NaN        NaN        NaN
2   3   [1, f, sd]          NaN  [34, daa]        1.0

How to compare values in the first-positioned list item across multiple columns in Python Pandas?

1 Answers1