1

I am trying to replace the column 'let' in the DataFrame london(which is a copy of another no_eco) with rows that only contain the strings in the contains() method. The code is as follows:

london = no_eco
london.loc[:,'let'] = london.loc[:,'let'].str.contains('E' or 'D' or 'F' or 'G' or 'H' or 'I' or 'J')
london.loc[:,'let'] = london.loc[:,'let'][london.loc[:,'let']]
london = london.dropna(subset = ['let'])
print(london)

The code works and I have dropped the rows where the strings are not met however I receive the following warning:

C:\Users\gerardchurch\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py:543: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas- docs/stable/indexing.html#indexing-view-versus-copy

and when looking at the documentation, I still can't understand what I am doing wrong.

Is this okay to continue using variable london or will I encounter problems in the future?

Thanks.

geds133
  • 1,503
  • 5
  • 20
  • 52
  • Change `london.loc[:,'distId'][london.loc[:,'distId']]` to `london.loc[london.loc[:,'distId'],'distId']` and the warning will go away. Not sure if in this case it's particularly harmful because you're doing it on the RHS of the assignment. – BallpointBen Nov 07 '18 at 14:48

1 Answers1

2

There are several issues with your code:

  1. london = no_eco doesn't assign a copy to london. Be explicit: london = no_eco.copy().
  2. pd.Series.str.contains supports regex by default, so use str.contains('E|D|F|G|H|I|J|').
  3. Your logic is confused. You first replace an object dtype series with a Boolean series, then you assign to it a subset indexed by itself, then use dropna, which is designed for null values.

Instead, just construct a Boolean series and use pd.DataFrame.loc with Boolean indexing:

london = no_eco.copy()
london = london.loc[london['let'].str.contains('E|D|F|G|H|I|J|')]

For this particular case, you can use pd.DataFrame.__getitem__ (df[] syntax) directly:

london = no_eco.copy()
london = london[london['let'].str.contains('E|D|F|G|H|I|J|')]
geds133
  • 1,503
  • 5
  • 20
  • 52
jpp
  • 159,742
  • 34
  • 281
  • 339