0

I would find the rows in a dataframe which contains all of the elements of a tuple and then set a value in a specific column for the corrisponding index of the row

for ix, row in df.iterrows():
    if set(tuple).issubset(row):
      df.loc[ix, ['label']] = 1

but I get this error:

TypeError: unhashable type: 'list'

if I perform the following chunk of code it seems that it works but I don't know how to set the value in the label column where the row match the tuple

for row in df.astype(str).values.tolist():
  set(tuple).issubset(row))

does anyone have any suggestions?

Thanks for the help

nassim
  • 1,547
  • 1
  • 14
  • 26
Guido Cei
  • 5
  • 4
  • can you give more detail about the dataframe ? do you mean you are giving a tuple and searching for it in a two column dataframe – nassim Apr 17 '20 at 21:54
  • My dataframe has many columns. I have to find the line containing all the elements of the tuple – Guido Cei Apr 19 '20 at 14:02

2 Answers2

0

Use a list comprehension for example with random generated data:

import pandas as pd
import numpy as np

np.random.seed(2)

tuples = list(zip(np.random.randint(0, 5, 10), np.random.randint(
    10, 15, 10), np.random.randint(20, 30, 10)))

data = pd.DataFrame(dict(tups=tuples))

data.head()

#   tups
# 0 (0, 14, 23)
# 1 (0, 14, 25)
# 2 (3, 14, 28)

Then you can set label generating the values from the list comphrension

tuple_subset = (0, 14)
data['Label'] = [1 if set(tuple_subset).issubset(x)
                 else None for x in data.tups]

data.head(3)

#   tups        Label
# 0 (0, 14, 23) 1.0
# 1 (0, 14, 25) 1.0
# 2 (3, 14, 28) NaN
jcaliz
  • 3,891
  • 2
  • 9
  • 13
  • Thanks, I like this solution but I don't know how to use list comprehension with all the values of all the columns at the same time (iterating one row at a time). – Guido Cei Apr 19 '20 at 14:06
0

Use enumerate and iloc.

for idx, row in enumerate(df.astype(str).values.tolist()):
    if set(tuple).issubset(row):
        df.iloc[idx, df.columns.get_loc('label')] = 1
Eric Truett
  • 2,970
  • 1
  • 16
  • 21
  • thank you!! .. I have find also this solution that it works: `for ix, row in df.astype(str).iterrows(): if set(tuple).issubset(row): df.loc[ix, ['label']] = 1` but I don't understand very well why – Guido Cei Apr 19 '20 at 13:59
  • ```iterrows``` returns a tuple of the index (here, ix) and the row values (here, row). So you can set the value of a row by df.loc[ix]. – Eric Truett Apr 19 '20 at 14:12
  • Thanks again.. but what I exactly don't understand is why the loop works with `df.astype(str).iterrows()` and not with `df.iterrrows()` – Guido Cei Apr 19 '20 at 15:04