Compare values under multiple conditions of one column in Python

Question

I have the following data:

data = {
    "index": [1, 2, 3, 4, 5],
    "name": ["A", "A", "B", "B", "B"],
    "type": ['s1', 's2', 's1', 's2', 's3'],
    'value': [20, 10, 18, 32, 25]
}
df = pd.DataFrame(data)

I need to check if the value under same name follow constraint (say there only three type and not all exist under same name): s1 < s2 < s3, which means, under same name, if the value of s1 is smaller than s2 or s3, then return True, if s2 is smaller than s3, then return True. Otherwise, return False or NaN. Here is the output I expected:

    index   name    type    value   result
0     1      A       s1      20      False
1     2      A       s2      10        
2     3      B       s1      18      True
3     4      B       s2      32      False
4     5      B       s3      25

How can I do it in Python? Thanks for your help.

Why are there dashes in some rows and `False`s in some other rows? What is the formula/algorithm for calculating each `result`? — DYZ, Dec 29 '18 at 07:15
@DYZ For instance, under A there are only s1. It would return dash or NaN if you like. — ah bon, Dec 29 '18 at 07:18
Your question is unclear. _for instance_ isn't good enough. What _exactly_ is the condition for `True`, `False`, and dash - for each of the outcomes separately? Once you have the formula, it is easy to code it. — DYZ, Dec 29 '18 at 07:19
@DYZ A s1 return False because s1 is not smaller than s2 in example. Same reason for B s2. — ah bon, Dec 29 '18 at 07:20
Ok, what about `True` and a dash? Also, can you have more than one `s1`, `s2` or `s3` per name? — DYZ, Dec 29 '18 at 07:21
@DYZ group by `name`, now say there are three type `s1, s2 and s3`, then check if `s1` is less than `s2` if yes state `True` else `False`, now take next pair, `s2` and `s3`. Do same with this pair. Since we don't have any next pair to form with `s3` hence `-` — meW, Dec 29 '18 at 07:23
@meW That' just an educated guess. I'd rather know what the OP has in mind. — DYZ, Dec 29 '18 at 07:25
Sorry@DYZ, The formula for calculating each result is here: s1 < s2 < s3 and I have only one s1, s2 or s3 per name. — ah bon, Dec 29 '18 at 07:52

Scott Boston · Accepted Answer · 2018-12-29T11:20:34.130

1

Try:

#Use pd.Categorical to ensure sorting if column is not lexicographical ordered.
df['type'] = pd.Categorical(df['type'], ordered=True, categories=['s1','s2','s3'])

df['result'] = df.sort_values('type').groupby('name')['value'].diff(-1)

df['result'] = df['result'].lt(0).mask(df['result'].isna(),'')

df

Output:

   index name type  value result
0      1    A   s1     20  False
1      2    A   s2     10       
2      3    B   s1     18   True
3      4    B   s2     32  False
4      5    B   s3     25

edited Dec 29 '18 at 11:20

answered Dec 29 '18 at 11:08

Scott Boston

147,308
15
139
187

1

why `sort_values('type')`. Answer still comes without it. Am I missing something? – meW Dec 29 '18 at 11:36
2

If the dataframe isn't sored by type, then diff will not work correctly. Diff(-1) takes the current row and subracts the next row regardless of sort. So, to get diff the perform as expected wit s1 – Scott Boston Dec 29 '18 at 11:40

Compare values under multiple conditions of one column in Python

1 Answers1

Linked