1

I want to use .endswith() or regexp in conditional subsetting of Sender name column in my dataframe.

Dataframe df has two columns Sender email, Sender name which I will use to define a subsetting rule, to select all mail coming from a specific shop and specific email of this shop:

df = df[(df["Sender name"]=="Shop_name"]) & (df["Sender email"]=="reply@shop.com")]

  • But then I figured out that there are also mails from buy@shop.com, noreply@shop.com, etc. Is there an any way to neatly introduce all this mailboxes into something like *@shop.com in the second condition?

  • I tried using endswith(), but couldn't figure out how to make it work for series object. I figured out I may first form a list with all mails from the column, and then check if sending mailserver is in it with pd.Series.isin. But maybe there is something more elegant out there?

smci
  • 32,567
  • 20
  • 113
  • 146
gregoruar
  • 345
  • 3
  • 14
  • Your question boils down to ***"How to use regexp on subsetting of string column in pandas?"*** – smci Aug 03 '19 at 12:02
  • Of course, it would be useful to have a link to the guide, if it is already posted elsewhere, but if not - it would be wonderful to have it here! – gregoruar Aug 03 '19 at 12:06
  • There are [25 questions on using `.endswith` in pandas](https://stackoverflow.com/search?q=%5Bpandas%5D+endswith+is%3Aquestion) – smci Aug 03 '19 at 12:16
  • Please skim the pandas doc on [string-handling methods for Series.str](https://pandas.pydata.org/pandas-docs/stable/reference/series.html#string-handling). I don't know what you mean by claiming *"`endswith()` does not work for `series` object."* The doc clearly shows you it does. Please post an example of it not working, include a snippet of input dataframe, your code with regex and the output/error. If you're asking for help writing/debugging a regx then just say so - don't claim the method doesn't work without showing your regex. – smci Aug 03 '19 at 12:21
  • I tried using `df["Sender name"].endswith(...)`, as I did not know about string-handling methods in pandas. Please, mark this question as a dupclicate, if it you consider it appropriate. – gregoruar Aug 03 '19 at 12:43
  • Ah. **You can't call string methods directly on a pandas string Series; you have to use the `.str` accessor**, so `df["Sender name"].str.endswith(...)` – smci Aug 03 '19 at 14:51

2 Answers2

2

Use Series.str.endswith or Series.str.contains with regex - $ for end of string and also escape . by \ , because . is special regex value - any character:

df1 = df[(df["Sender name"]=="Shop_name"]) & (df["Sender email"].str.endswith("@shop.com"))]

Or:

df1 = df[(df["Sender name"]=="Shop_name"]) & (df["Sender email"].str.contains("@shop\.com$"))]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I notice that in Pandas 1.5.x, there is no regex option for `.endswith` but `.contains` will work. Should explicitly specify `regex=True`. In the source code, you can wander around to see what they do. I think implementation is using Python `re` `.search` method on a compiled string, so you could write your own list comprehension to do same work. – pauljohn32 Nov 21 '22 at 23:52
1

Using .query

Since pandas >= 0.25.0 we can use .query with pandas methods (.eq & str.endswith) and using the backtick (`) to query column names with spaces:

df.query('`Sender name`.eq("Shop_name") & `Sender email`.str.endswith("@shop.com")')

Output

       Sender email Sender name
2    reply@shop.com   Shop_name
3      buy@shop.com   Shop_name
4  noreply@shop.com   Shop_name

Example dataframe used:

# Example dataframe
df = pd.DataFrame({'Sender email':['ex@example.com', 'ex2@example.com', "reply@shop.com", "buy@shop.com", "noreply@shop.com"],
                   'Sender name': ['example', 'example', 'Shop_name', 'Shop_name', 'Shop_name']})

       Sender email Sender name
0    ex@example.com     example
1   ex2@example.com     example
2    reply@shop.com   Shop_name
3      buy@shop.com   Shop_name
4  noreply@shop.com   Shop_name
Erfan
  • 40,971
  • 8
  • 66
  • 78