0

First I create a data set using pandas function pd.read_sql(). As far as I know, all imported columns are strings.

Then I create a new null variable and define a function, like so (tinyurl.com/tnr9b83):

df['status_update'] = ""
def f(row):
    if (row['priority'] in ("1","2")) and (row['failed'] == "Y"):
        val = "F"
    elif (row['priority'] in ("1","2")):
        val = row['status'].str.slice(0,1)
    else:
        val = "X"
    return val

Then I try to change every row of my data set so that:

  • if a record has priority in ("1","2") and failed = "Y", it gets status_update = "F"
  • else if a record has priority in ("1","2"), it gets a status_update = the first letter substring of column 'status'
  • else it gets status_update = "X"

So I run:

df['status_update'] = df.apply(f, axis=1)

..but this gives:

AttributeError: 'str' object has no attribute 'str'

I've tried alternative syntax to no avail. Others who report this error seem to have different situations and resolutions. As a new python programmer, what are the best first steps/tools/functions toward understanding why this syntax/logic won't work in this situation?

Edit: clarification: error is related to "val = row['status'].str.slice(0,1)" Edit2: worth noting, when I opened the data viewer it had something like []...[]...[] instead of a single character value for many observations in the new 'status_update' field, so I'm guessing that some kind of array or vector is being returned instead of a single substring.

lewispgj
  • 3
  • 1
  • 4
  • What is `row`? Is it a dataframe row object, or is it a plain dictionary? Try putting `print(type(row))` at the top of the function. – John Gordon Jun 09 '20 at 16:53
  • 'row' is a placeholder for a dataframe row object, I believe. I added your suggestion but do not see relevant output to share. The purpose of the script is to change the value of column "status_update" for all rows based on the provided criteria. Syntax was lifted from top answer here: tinyurl.com/tnr9b83 – lewispgj Jun 09 '20 at 17:00
  • The error says that `row['status']` is already a `str`. What kind of object has a `str` attribute? In Python each class of object has documented attributes and methods. If your class is wrong, or the attribute is wrong, you'll get this kind of error. It's not a matter of syntax. You need to match objects with their correct attributes. – hpaulj Jun 09 '20 at 17:35
  • Thanks, hpaulj. I'm not sure if it's a rhetorical question, but I'm guessing that str objects have str attributes, so, as you say, row['status'] is already a str. But if that's true, why doesn't changing it to `row['status'].slice(0,1)` work, as Gels_YT suggested? – lewispgj Jun 09 '20 at 17:45

2 Answers2

0
val = row['status'].str.slice(0,1)

try remove the .str in this code Try to make it

val = row['status'].slice(0,1)

I don't know what your row is , but I'm guessing it's a string

  • One of many solutions I tried. This returns "AttributeError: 'str' object has no attribute 'slice'". Doesn't this imply that python is working on a 'str' object, though? As we both expect? When I run `df[status_update].dtypes I get `dtype('O')`, which seems to be a python str according to this answer: https://stackoverflow.com/questions/37561991/what-is-dtypeo-in-pandas – lewispgj Jun 09 '20 at 17:11
  • `slice` is an extra method in the `Series.str` collection. It is not a Python `str` method. Presumably it behaves like slice indexing: `astring[0:1]`. – hpaulj Jun 09 '20 at 17:54
0

Let's define a simple dataframe with string elements. You really should supply such an example. It makes it easier to debug and suggest fixes. I might be wrong about the essential characteristics of your frame. In any case:

In [273]: df1 = pd.DataFrame([['abc'],['bcd']], columns=['a'])                                
In [274]: df1                                                                                 
Out[274]: 
     a
0  abc
1  bcd

A Series:

In [275]: df1['a']                                                                            
Out[275]: 
0    abc
1    bcd
Name: a, dtype: object
In [276]: type(df1['a'])                                                                      
Out[276]: pandas.core.series.Series

It has a str attribute, which gives access to some string methods:

In [277]: df1['a'].str                                                                        
Out[277]: <pandas.core.strings.StringMethods at 0x7febb75c4ba8>
In [278]: df1['a'].str.upper()                                                                
Out[278]: 
0    ABC
1    BCD
Name: a, dtype: object
In [279]: df1['a'].str.slice(0,1)                                                             
Out[279]: 
0    a
1    b
Name: a, dtype: object

Now define a function that can be 'applied' as you do. First get a clear idea of what objects the function has to work with. NO GUESSING!

In [280]: def foo(row): 
     ...:     print(type(row), type(row['a'])) 
     ...:     return row 
     ...:                                                                                     
In [281]: df1.apply(foo, axis=1)                                                              
<class 'pandas.core.series.Series'> <class 'str'>
<class 'pandas.core.series.Series'> <class 'str'>
<class 'pandas.core.series.Series'> <class 'str'>
Out[281]: 
     a
0  abc
1  bcd

So row['a'] is a string, not a series. As the error shows, a str object does not itself have a str attribute. It does have methods like upper. And to slice it we have to use the indexing notation, not a slice method.

In [284]: def foo(row): 
     ...:     print(type(row), type(row['a'])) 
     ...:     return row['a'][0:1] 
     ...:                                                                                     
In [285]: df1.apply(foo, axis=1)                                                              
<class 'pandas.core.series.Series'> <class 'str'>
<class 'pandas.core.series.Series'> <class 'str'>
Out[285]: 
0    a
1    b
dtype: object

or applying the str.slice to the row Series object:

In [288]: def foo(row): 
     ...:     return row.str.slice(0,1) 
     ...:      
     ...:                                                                                     
In [289]: df1.apply(foo, axis=1)                                                              
Out[289]: 
   a
0  a
1  b

When you get an attribute error, check the class of the object, and its docs. One way or other there's a mismatch between the class of the object and the attribute/method you are trying to use. Sometimes it's because you misread the docs (or didn't read them in the first place), but more often it's because the object isn't of the class you think it should be.

===

checking for a slice attribute/method:

In [297]: df1.slice                                                                           
....
AttributeError: 'DataFrame' object has no attribute 'slice'

In [298]: df1['a'].slice                                                                      
....
AttributeError: 'Series' object has no attribute 'slice'

In [299]: df1['a'].str.slice                                                                  
Out[299]: <bound method StringMethods.slice of <pandas.core.strings.StringMethods object at 0x7febb75c4ba8>>

In [300]: 'astring'.slice                                                                     
...
AttributeError: 'str' object has no attribute 'slice'
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thank you so much, hpaulj; this makes good sense and helps me understand how to approach these sorts of errors in the future! – lewispgj Jun 09 '20 at 18:19