How to check condition on multiple columns after using groupby on pandas data frame?

Question

I have following df:

data = [
    ['a', 'one', 10],
    ['a', 'two', 10],
    ['b', 'one', 13],
    ['b', 'two', 100],
    ['c', 'two', 100],
    ['c', 'one', 100],
    ['d', 'one', 100],
    ['d', 'one', 10
]
df = pd.DataFrame(data, columns=['key1', 'key2', 'key3'])

key1    key2    key3
0   a   one 10
1   a   two 10
2   b   one 13
3   b   two 100
4   c   two 100
5   c   one 100
6   d   one 100
7   d   one 10

I need to group by key1,

search for key2 == 'one' (for single occurrence result = OK, for 0 or >1 = NOK),
also additionally check if key3 == 10 or 100 then OK, anything else is NOK.

As result I should have OK or NOK. How can I get something like the following as output?

key1  result
a   ok
b   nok
c   ok
d   nok

I found a similar question and answer here but without additional check for key3. What I have so far is the following:

test = df.groupby('key1')['key2'].apply(lambda x:(x=='one').sum()).reset_index(name='result')
test['result'].where(~(test['result'] > 1), other='nok', inplace=True)
test['result'].replace([0, 1], ['nok', 'ok'], inplace = True)

When I run this code, it gives me following output:

key1    result
0   a   ok
1   b   ok
2   c   ok
3   d   nok

How can I add the check for key3?

Please put your dataframe as text rather than images. It will help you get more answers. — Vishnudev Krishnadas, Jul 30 '23 at 09:30

score 2 · Answer 1 · answered Jul 30 '23 at 09:49

2

IIUC, you can try:

def fn(x):
    # filter "one"
    x = x[x['key2'] == 'one']

    # there are zero or >1, return "nok"
    if len(x) == 0 or len(x) > 1:
        return 'nok'

    # is the single `key3` 10 or 100? If yes, return "ok"
    if x['key3'].iat[0] in [10, 100]:
        return 'ok'

    # otherwise "nok"
    return 'nok'

out = df.groupby('key1').apply(fn).reset_index(name="result")
print(out)

Prints:

  key1 result
0    a     ok
1    b    nok
2    c     ok
3    d    nok

answered Jul 30 '23 at 09:49

Andrej Kesely

168,389
15
48
91

quick question, if I want for example to change 'one' for another value and take it as an input to fn(x, key2_var). How should I call this function inside of apply(fn)? – Koke Abeke Jul 30 '23 at 12:36
@KokeAbeke You could call it like `out = df.groupby('key1').apply(lambda x: fn(x, "other value"))` – Andrej Kesely Jul 30 '23 at 13:00

score 1 · Answer 2 · answered Jul 30 '23 at 12:40

One-liner using:

groupby('key1')['key2'].value_counts().xs('one', level='key2') == 1) for the check on key2
groupby('key1')['key3'].unique() to get key3 values for each key1
isin([10, 100]).all(axis=1) to check condition on key3

pd.DataFrame({"result": ((df.groupby("key1")["key2"].value_counts().xs("one", level="key2") == 1) & (pd.DataFrame.from_dict(dict(zip(df.groupby("key1")["key3"].unique().index, df.groupby("key1")["key3"].unique().values)), orient="index").isin([10, 100, np.nan]).all(axis=1))).replace([True, False], ["ok", "nok"]),})

It is ~2x slower than using groupby().apply(). See comparison below:

def fast(): # from @Adrej Kesely (10035985)
    def fn(x):
        # filter 'one'
        x = x[x['key2'] == 'one']

        # there are zero or >1, return 'nok'
        if len(x) == 0 or len(x) > 1:
            return 'nok'

        # is the single `key3` 10 or 100? If yes, return 'ok'
        if x['key3'].iat[0] in [10, 100]:
            return 'ok'

        # otherwise 'nok'
        return 'nok'
    return df.groupby('key1').apply(fn).reset_index(name='result')

%timeit fast()
# 677 µs ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

def slow():
    return pd.DataFrame({"result": ((df.groupby("key1")["key2"].value_counts().xs("one", level="key2") == 1) & (pd.DataFrame.from_dict(dict(zip(df.groupby("key1")["key3"].unique().index, df.groupby("key1")["key3"].unique().values)), orient="index").isin([10, 100, np.nan]).all(axis=1))).replace([True, False], ["ok", "nok"]),})

%timeit slow()
# 1.42 ms ± 36.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Documentation for methods used:

value_counts() to count occurrences
xs() to get a cross section of a data frame
unique()
isin() to check whether there are key3 values that are not in the "ok" list
all(axis=1) to check truth values by row
replace() to replace bools with "ok" and "nok"

Hi @Joshua, your code is kinda advanced one for me, thank you though) — Koke Abeke, Jul 30 '23 at 12:50
No worries! It was mostly as a fun exercise since you already had a working answer from Andrej. Good luck! — Joshua Shew, Jul 30 '23 at 13:00

How to check condition on multiple columns after using groupby on pandas data frame?

2 Answers2