2

I am just trying to categorize some data with pandas Basically my data is a string and I want to modify it depending on the value of the X first characters.

I tried this :

data['BO In Code'].loc[data['BO In Code'][:2]=='XU']=1

Unalignable boolean Series key provided

This :

data['BO In Code'].loc[str(data['BO In Code'])[:2]=='XU']=1

and this :

data['BO In Code'].loc[data['BO In Code'].index[:2]=='XU']=1

gave me :

'Cannot use a single bool to index into setitem'

Mayeul sgc
  • 1,964
  • 3
  • 20
  • 35
  • This will help you : http://stackoverflow.com/questions/33817842/keyerror-when-using-boolean-filter-on-pandas-data-frame – Harsha W Apr 18 '17 at 04:40

1 Answers1

3

You need to use the str string accessor

data.loc[data['BO In Code'].str[:2]=='XU', 'BO In Code'] = 1

explanation

.loc for dataframes can take two indexers. Those indexers can be a single index value, a list of index values, or an array of booleans of equal length as the corresponding dimension being sliced.

In this case, the first indexer is a boolean array where each value is the truth of whether the first 2 characters in the column 'BO In Code' is equal to 'XU'. We use this to filter the rows of the the dataframe. We still need to specify what column we want. Happens that we want 'BO In Code'.

So the first reference to 'BO In Code' was to find the boolean slice. The second reference to 'BO In Code' was to specify the column we wanted. It did not have to be the same column.

piRSquared
  • 285,575
  • 57
  • 475
  • 624