how to iterate using iterrows() and check for a subset

Question

I would find the rows in a dataframe which contains all of the elements of a tuple and then set a value in a specific column for the corrisponding index of the row

for ix, row in df.iterrows():
    if set(tuple).issubset(row):
      df.loc[ix, ['label']] = 1

but I get this error:

TypeError: unhashable type: 'list'

if I perform the following chunk of code it seems that it works but I don't know how to set the value in the label column where the row match the tuple

for row in df.astype(str).values.tolist():
  set(tuple).issubset(row))

does anyone have any suggestions?

Thanks for the help

can you give more detail about the dataframe ? do you mean you are giving a tuple and searching for it in a two column dataframe — nassim, Apr 17 '20 at 21:54
My dataframe has many columns. I have to find the line containing all the elements of the tuple — Guido Cei, Apr 19 '20 at 14:02

score 0 · Answer 1 · answered Apr 17 '20 at 22:33

Use a list comprehension for example with random generated data:

import pandas as pd
import numpy as np

np.random.seed(2)

tuples = list(zip(np.random.randint(0, 5, 10), np.random.randint(
    10, 15, 10), np.random.randint(20, 30, 10)))

data = pd.DataFrame(dict(tups=tuples))

data.head()

#   tups
# 0 (0, 14, 23)
# 1 (0, 14, 25)
# 2 (3, 14, 28)

Then you can set label generating the values from the list comphrension

tuple_subset = (0, 14)
data['Label'] = [1 if set(tuple_subset).issubset(x)
                 else None for x in data.tups]

data.head(3)

#   tups        Label
# 0 (0, 14, 23) 1.0
# 1 (0, 14, 25) 1.0
# 2 (3, 14, 28) NaN

Thanks, I like this solution but I don't know how to use list comprehension with all the values of all the columns at the same time (iterating one row at a time). — Guido Cei, Apr 19 '20 at 14:06

score 0 · Accepted Answer · answered Apr 18 '20 at 00:41

0

Use enumerate and iloc.

for idx, row in enumerate(df.astype(str).values.tolist()):
    if set(tuple).issubset(row):
        df.iloc[idx, df.columns.get_loc('label')] = 1

answered Apr 18 '20 at 00:41

Eric Truett

2,970
1
16
21

thank you!! .. I have find also this solution that it works: `for ix, row in df.astype(str).iterrows(): if set(tuple).issubset(row): df.loc[ix, ['label']] = 1` but I don't understand very well why – Guido Cei Apr 19 '20 at 13:59
```iterrows``` returns a tuple of the index (here, ix) and the row values (here, row). So you can set the value of a row by df.loc[ix]. – Eric Truett Apr 19 '20 at 14:12
Thanks again.. but what I exactly don't understand is why the loop works with `df.astype(str).iterrows()` and not with `df.iterrrows()` – Guido Cei Apr 19 '20 at 15:04

how to iterate using iterrows() and check for a subset

2 Answers2