Return max of zero or value for a pandas DataFrame column

Question

I want to replace negative values in a pandas DataFrame column with zero.

Is there a more concise way to construct this expression?

df['value'][df['value'] < 0] = 0

That looks like how you'd do it in `numpy` ... I doubt there's a more concise way other than breaking it into 2 statements. — mgilson, Jun 12 '13 at 14:40

unutbu · Answer 1 · 2013-06-12T15:00:58.177

31

You could use the clip method:

import pandas as pd
import numpy as np
df = pd.DataFrame({'value': np.arange(-5,5)})
df['value'] = df['value'].clip(0, None)
print(df)

yields

edited Jun 12 '13 at 15:00

answered Jun 12 '13 at 14:44

unutbu

842,883
184
1,785
1,677

2

On a side note, you don't need the `np.inf` as the second argument. It defaults to None, which is equivalent, in this case. If you're concerned about readability with leaving the second argument out, you could use `df.value.clip_lower(0)` instead. – Joe Kington Jun 12 '13 at 14:52
For what it's worth, I assumed `pandas.DataFrame.clip` behaved like `numpy` as well. I only realized that it didn't (and you didn't need the second argument) about 5 minutes ago! :) – Joe Kington Jun 12 '13 at 14:55
@JoeKington: When `df` is a `pandas.DataFrame`, `df['values']` is a `pandas.Series`, which is a subclass of `ndarray`. – unutbu Jun 12 '13 at 14:56
True, but the `clip` method is different (has default parameters for `lower` and `upper`), and it has additional methods such as `clip_lower` and `clip_upper`. – Joe Kington Jun 12 '13 at 14:58
@JoeKington: Hm, my pandas appears to be too old then. In version 0.10.0, it inherits from `numpy.clip`. – unutbu Jun 12 '13 at 14:59
It sill behaves similar to `numpy.clip` but it automatically fills in the max or min bounds with the max or min of the data if one of the bounds is left out: https://github.com/pydata/pandas/blob/master/pandas/core/series.py#L1954 As far as I can tell, that's the case for 0.10, as well. – Joe Kington Jun 12 '13 at 15:05
This answer doesn't achieve what the OP has asked: "Is there a more concise way to construct this expression?" His original formulation is already more concise. – heltonbiker Jun 12 '13 at 15:06
@JoeKington: Indeed, you are right. I don't know what I was doing this morning that led me to think otherwise... – unutbu Jun 12 '13 at 17:55
This is almost twice as slow as `numpy.maximum`: https://stackoverflow.com/a/33000983/1840471. – Max Ghenis Jul 12 '18 at 22:57

score 27 · Answer 2 · edited Jul 12 '18 at 22:47

27

Another possibility is numpy.maximum(). This is more straight-forward to read in my opinion.

import pandas as pd
import numpy as np
df['value'] = np.maximum(df.value, 0)

It's also significantly faster than all other methods.

df_orig = pd.DataFrame({'value': np.arange(-1000000, 1000000)})

df = df_orig.copy()
%timeit df['value'] = np.maximum(df.value, 0)
# 100 loops, best of 3: 8.36 ms per loop

df = df_orig.copy()
%timeit df['value'] = np.where(df.value < 0, 0, df.value)
# 100 loops, best of 3: 10.1 ms per loop

df = df_orig.copy()
%timeit df['value'] = df.value.clip(0, None)
# 100 loops, best of 3: 14.1 ms per loop

df = df_orig.copy()
%timeit df['value'] = df.value.clip_lower(0)
# 100 loops, best of 3: 14.2 ms per loop

df = df_orig.copy()
%timeit df.loc[df.value < 0, 'value'] = 0
# 10 loops, best of 3: 62.7 ms per loop

(notebook)

edited Jul 12 '18 at 22:47

Max Ghenis

14,783
16
84
132

answered Oct 07 '15 at 19:38

Dorian B.

1,101
13
22

i think this is good, the only bad thing is that `np.max != np.maximum` which makes this bad mnemonically – maxymoo Oct 30 '16 at 23:14
@maxymoo I agree, they should have gone with the SQL `GREATEST` name to distinguish. – Max Ghenis Jul 12 '18 at 23:11
Using `np.fmax` and `np.amax` in place of `np.maximum` and `np.max` can help avoid the confusion. – Paul Rougieux Feb 08 '22 at 22:36

Jeff · Accepted Answer · 2013-06-12T15:17:45.297

21

Here is the canonical way of doing it, while not necessarily more concise, is more flexible (in that you can apply this to arbitrary columns)

In [39]: df = DataFrame(randn(5,1),columns=['value'])

In [40]: df
Out[40]: 
      value
0  0.092232
1 -0.472784
2 -1.857964
3 -0.014385
4  0.301531

In [41]: df.loc[df['value']<0,'value'] = 0

In [42]: df
Out[42]: 
      value
0  0.092232
1  0.000000
2  0.000000
3  0.000000
4  0.301531

edited Jun 12 '13 at 15:17

answered Jun 12 '13 at 14:44

Jeff

125,376
21
220
187

Much more flexible (and less obscure). – Andy Hayden Jun 12 '13 at 14:53
Though in some respects, the np.clip or np.max solutions are more easily read, I think this is the most precise answer to my original question. – bjornarneson Jun 12 '13 at 15:09
Would .ix be slightly better than .loc since .ix is the more general form? Or are there arguments in favor of loc over ix? - http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing-loc-iloc-and-ix – A.Wan Jun 10 '14 at 21:28
This is the slowest method of any answer here, ~7x slower than `np.maximum` as suggested in https://stackoverflow.com/a/33000983/1840471. – Max Ghenis Jul 12 '18 at 22:56

score 2 · Answer 4 · answered Oct 14 '18 at 10:08

Or where to check:

>>> import pandas as pd,numpy as np
>>> df = pd.DataFrame(np.random.randn(5,1),columns=['value'])
>>> df
      value
0  1.193313
1 -1.011003
2 -0.399778
3 -0.736607
4 -0.629540
>>> df['value']=df['value'].where(df['value']>0,0)
>>> df
      value
0  1.193313
1  0.000000
2  0.000000
3  0.000000
4  0.000000
>>>

score 1 · Answer 5 · answered Jul 12 '18 at 22:59

1

For completeness, np.where is also a possibility, which is faster than most answers here. The np.maximum answer is the best approach though, as it's faster and more concise than this.

df['value'] = np.where(df.value < 0, 0, df.value)

answered Jul 12 '18 at 22:59

Max Ghenis

14,783
16
84
132

score 0 · Answer 6 · answered Sep 26 '17 at 11:42

0

Let's take only values greater than zero, leaving those which are negative as NaN (works with frames not with series), then impute.

df[df > 0].fillna(0)

answered Sep 26 '17 at 11:42

Coolkau

128
1
8

Return max of zero or value for a pandas DataFrame column

6 Answers6

Linked

Related