Python : Adding conditional column to pandas dataframe, more pythonic solution?

Question

I'm adding a column to a dataframe where the column values are determined by comparing two other columns in the dataframe. The code to add the column is:

lst = []
for x in range(len(df)):
    if df['ColumnA'][x] > df['ColumnB'][x]:
        lst.append(df['ColumnB'][x])
    else:
        lst.append(df['ColumnA'][x])

df['ColumnC'] = lst

My question is, is there a more efficient/pythonic way to do this? I have been advised in the past to be wary if I'm every looping through every row of a dataframe, so wanted to see if I was missing something. Thank you!

Possible duplicate of [Find the max of two or more columns with pandas](https://stackoverflow.com/questions/12169170/find-the-max-of-two-or-more-columns-with-pandas) — G. Anderson, Feb 01 '19 at 18:14

score 6 · Accepted Answer · answered Feb 01 '19 at 18:12

6

Yes, just take the minimum:

df['ColumnC'] = df[['ColumnA', 'ColumnB']].min(1)

answered Feb 01 '19 at 18:12

jpp

159,742
34
281
339

do you know how this compares perf. wise to np.where off the top? No worries if not, good solution! – Arjun Arun Feb 01 '19 at 18:27
Probably `np.where`, but don't prematurely optimize.. Write good, understandable code. Then discover bottlenecks and optimize them. – jpp Feb 01 '19 at 18:31

score 2 · Answer 2 · answered Feb 01 '19 at 18:13

2

Use numpy.where

df['ColumnC'] = np.where(df['ColumnA'] > df['ColumnB'], df['ColumnB'], df['ColumnA'])

answered Feb 01 '19 at 18:13

Sociopath

13,068
19
47
75

score 0 · Answer 3 · answered Feb 01 '19 at 18:17

0

A bit more code than other solutions but arguably more generalizable

mask = df[ColumnA] > df[ColumnB]
df[ColumnC] = pd.Series(index=df.index)
df[ColumnC].loc[mask] = df[ColumnA].loc[mask]
df[ColumnC].loc[~mask] = df[ColumnB].loc[~mask]

answered Feb 01 '19 at 18:17

Alex

12,078
6
64
74

Python : Adding conditional column to pandas dataframe, more pythonic solution?

3 Answers3