15

Creating a dataframe using subsetting with below conditions

subset_df = df_eq.loc[(df_eq['place'].str.contains('Chile')) & (df_eq['mag'] > 7.5),['time','latitude','longitude','mag','place']]

Want to replicate the above subset using query() in Pandas.However not sure how to replicate str.contains() equivalent in Pandas query. "like" in query doesn't seem to work

query_df = df_eq[['time','latitude','longitude','mag','place']].query('place like \'%Chile\' and mag > 7.5')

place like '%Chile'and mag >7.5 
            ^
SyntaxError: invalid syntax

Any help will be appreciated

raul
  • 631
  • 2
  • 10
  • 23
  • I'm grasping at straws here, but you _might_ be able to use python's `in` operator if you set the `engine='python'`. If it works, it will likely end up with a pretty inefficient query (normally `pandas` tries to use `numexpr` to speed things up but `numexpr` doesn't support the `in` operator ...) – mgilson Jul 29 '16 at 15:14
  • 3
    AFAIK, SQL `like` operator is not yet implemented in pandas `query()` method, so you can't do it using `query()` method – MaxU - stand with Ukraine Jul 29 '16 at 17:39
  • Thanks for your comments.Yeah like operator isn't there so the work around remains str.contains() – raul Jul 30 '16 at 07:03
  • Hi from Chile, I'm wondering why you use `.iloc` in this case the following should be enough `df_eq[(df_eq['place'].str.contains('Chile')) & (df_eq['mag'] > 7.5)][['time','latitude','longitude','mag','place']]` – rpanai Nov 16 '18 at 19:46

3 Answers3

12

As of now I am able to do this by using the engine='python' argument of the .query method to use str.contains inside a query.

This should work:

query_df = df_eq[['time', 'latitude', 'longitude', 'mag', 'place']].query(
    "place.str.contains('Chile') and mag > 7.5", engine="python")
petobens
  • 1,330
  • 1
  • 12
  • 23
9

What I think is going on here is that you are not able to utilize the method str.contains within the query pandas method. What you can do is create a mask and refer to that mask from within query using the at sign (@). Try this:

my_mask = df_eq["feature"].str.contains('my_word')
df_eq.query("@my_mask")
7

Using str.contains works for me in pandas 1.0.0 with this syntax:

df.query("columnA == 'foo' and columnB.str.contains('bar')")
eddygeek
  • 4,236
  • 3
  • 25
  • 32
  • 1
    Check if "numexpr" module is installed or not. If not, a default "python" engine is used where `str.contains` is a valid expression. – ARA1307 Apr 21 '20 at 20:42