2

I am trying to delete rows that contain certain strings. However, I am getting the error:

pandas - 'dataframe' object has no attribute 'str' error.

Here is my code:

df = df[~df['colB'].str.contains('Example:')] 

How can I fix this?

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
borfo
  • 201
  • 2
  • 4
  • 14
  • 2
    Show us your Dataframe sample. – Frank AK Apr 12 '19 at 02:07
  • colA colB colC Alex Example: some string 15.00 Jim Example: other string 12.15 John clear 5.00 Marc Yellow 2.00 – borfo Apr 12 '19 at 02:13
  • 1
    So your error means that df['colB'] does not return a Series, it returns a DataFrame, and for a DataFrame there is no str attribute. Without investigating the actual df we can't really help you. – Ben Pap Apr 12 '19 at 02:19
  • This might help you out. https://stackoverflow.com/questions/51502263/pandas-dataframe-object-has-no-attribute-str – run-out Apr 12 '19 at 04:00
  • 1
    Possible duplicate of [pandas - 'dataframe' object has no attribute 'str'](https://stackoverflow.com/questions/51502263/pandas-dataframe-object-has-no-attribute-str) – run-out Apr 12 '19 at 04:01
  • show the dataframe sample in the question not in comments – Sreeram TP Apr 12 '19 at 05:17

2 Answers2

8

First problem shoud be duplicated columns names, so after select colB get not Series, but DataFrame:

df = pd.DataFrame([['Example: s', 'as', 2], ['dd', 'aaa', 3]], columns=['colB','colB','colC'])
print (df)
         colB colB  colC
0  Example: s   as     2
1          dd  aaa     3

print (df['colB'])
         colB colB
0  Example: s   as
1          dd  aaa

#print (df['colB'].str.contains('Example:'))
#>AttributeError: 'DataFrame' object has no attribute 'str'

Solution should be join columns together:

print (df['colB'].apply(' '.join, axis=1))
0    Example: s as
1           dd aaa

df['colB'] = df.pop('colB').apply(' '.join, axis=1)
df = df[~df['colB'].str.contains('Example:')] 
print (df)
   colC    colB
1     3  dd aaa

Second problem should be hidden MultiIndex:

df = pd.DataFrame([['Example: s', 'as', 2], ['dd', 'aaa', 3]], columns=['colA','colB','colC'])
df.columns = pd.MultiIndex.from_arrays([df.columns])
print (df)
         colA colB colC
0  Example: s   as    2
1          dd  aaa    3

print (df['colB'])
  colB
0   as
1  aaa

#print (df['colB'].str.contains('Example:'))
#>AttributeError: 'DataFrame' object has no attribute 'str'

Solution is reassign first level:

df.columns = df.columns.get_level_values(0)
df = df[~df['colB'].str.contains('Example:')] 
print (df)
         colA colB  colC
0  Example: s   as     2
1          dd  aaa     3

And third should be MultiIndex:

df = pd.DataFrame([['Example: s', 'as', 2], ['dd', 'aaa', 3]], columns=['colA','colB','colC'])
df.columns = pd.MultiIndex.from_product([df.columns, ['a']])
print (df)
         colA colB colC
            a    a    a
0  Example: s   as    2
1          dd  aaa    3

print (df['colB'])
     a
0   as
1  aaa

print (df.columns)
MultiIndex(levels=[['colA', 'colB', 'colC'], ['a']],
           codes=[[0, 1, 2], [0, 0, 0]])

#print (df['colB'].str.contains('Example:'))
#>AttributeError: 'DataFrame' object has no attribute 'str'

Solution is select MultiIndex by tuple:

df1 = df[~df[('colB', 'a')].str.contains('Example:')] 
print (df1)
         colA colB colC
            a    a    a
0  Example: s   as    2
1          dd  aaa    3

Or reassign back:

df.columns = df.columns.get_level_values(0)
df2 = df[~df['colB'].str.contains('Example:')] 
print (df2)
         colA colB  colC
0  Example: s   as     2
1          dd  aaa     3

Or remove second level:

df.columns = df.columns.droplevel(1)
df2 = df[~df['colB'].str.contains('Example:')] 
print (df2)
         colA colB  colC
0  Example: s   as     2
1          dd  aaa     3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Came here because I had duplicate columns, as the first part of this says. So obvious once I saw it; how did I not figure that out immediately? The pointer is much appreciated. – T. Shaffner Dec 01 '22 at 21:00
0

Try this:

df[[~df.iloc[i,:].str.contains('String_to_match').any() for i in range(0,len(df))]]

hacker315
  • 1,996
  • 2
  • 13
  • 23