1

I'm trying to update some columns of a dataframe where some condition is met (only some lines will meet the condition).

I'm using apply with loc. My function returns a pandas series.

The problem is that the columns are updates with NaN.

Simplifying my problem, we can consider the following dataframe df_test:

    col1    col2    col3    col4
0   A   1   1   2
1   B   2   1   2
2   A   3   1   2
3   B   4   1   2

I now want to update col3 and col4 when col1=A. For that I'll use the apply method

df_test.loc[df_test['col1']=='A', ['col3', 'col4']] = df_test[df_test['col1']=='A'].apply(lambda row: pd.Series([10,20]), axis=1)

Doing that I get:

    col1    col2    col3    col4
0   A   1   NaN NaN
1   B   2   1.0 2.0
2   A   3   NaN NaN
3   B   4   1.0 2.0

If instead of pd.Series([10, 20]) I use np.array([10, 20]) or [10, 20] I get the following error

ValueError: shape mismatch: value array of shape (2,2) could not be broadcast to indexing result of shape (2,)

What do I need to return to obtain

col1    col2    col3    col4
0   A   1   10  20
1   B   2   1   2
2   A   3   10  20
3   B   4   1   2

thanks!

Pedro Cruz
  • 21
  • 2

1 Answers1

0

You can fix this by applying the correct indexing in the pd.Series contructor in the df.apply like this:

df.loc[df['col1'] == 'A', ['col3', 'col4']] = df.loc[df['col1'] == 'A'].apply(lambda x: pd.Series([10,20], index=['col3', 'col4']), axis=1)

Note, I am matching the pd.Series index with the expected column headers in the dataframe. Pandas does most operations with index alignment in mind.

Output:

  col1  col2  col3  col4
0    A     1    10    20
1    B     2     1     2
2    A     3    10    20
3    B     4     1     2
Scott Boston
  • 147,308
  • 15
  • 139
  • 187