How to use regexp in endswith() in conditional subsetting of pandas df column?

Question

I want to use .endswith() or regexp in conditional subsetting of Sender name column in my dataframe.

Dataframe df has two columns Sender email, Sender name which I will use to define a subsetting rule, to select all mail coming from a specific shop and specific email of this shop:

df = df[(df["Sender name"]=="Shop_name"]) & (df["Sender email"]=="reply@shop.com")]

But then I figured out that there are also mails from buy@shop.com, noreply@shop.com, etc. Is there an any way to neatly introduce all this mailboxes into something like *@shop.com in the second condition?
I tried using endswith(), but couldn't figure out how to make it work for series object. I figured out I may first form a list with all mails from the column, and then check if sending mailserver is in it with pd.Series.isin. But maybe there is something more elegant out there?

Your question boils down to ***"How to use regexp on subsetting of string column in pandas?"*** — smci, Aug 03 '19 at 12:02
Of course, it would be useful to have a link to the guide, if it is already posted elsewhere, but if not - it would be wonderful to have it here! — gregoruar, Aug 03 '19 at 12:06
There are [25 questions on using `.endswith` in pandas](https://stackoverflow.com/search?q=%5Bpandas%5D+endswith+is%3Aquestion) — smci, Aug 03 '19 at 12:16
Please skim the pandas doc on [string-handling methods for Series.str](https://pandas.pydata.org/pandas-docs/stable/reference/series.html#string-handling). I don't know what you mean by claiming *"`endswith()` does not work for `series` object."* The doc clearly shows you it does. Please post an example of it not working, include a snippet of input dataframe, your code with regex and the output/error. If you're asking for help writing/debugging a regx then just say so - don't claim the method doesn't work without showing your regex. — smci, Aug 03 '19 at 12:21
I tried using `df["Sender name"].endswith(...)`, as I did not know about string-handling methods in pandas. Please, mark this question as a dupclicate, if it you consider it appropriate. — gregoruar, Aug 03 '19 at 12:43
Ah. **You can't call string methods directly on a pandas string Series; you have to use the `.str` accessor**, so `df["Sender name"].str.endswith(...)` — smci, Aug 03 '19 at 14:51

jezrael · Accepted Answer · 2019-08-03T11:49:48.443

2

Use Series.str.endswith or Series.str.contains with regex - $ for end of string and also escape . by \ , because . is special regex value - any character:

df1 = df[(df["Sender name"]=="Shop_name"]) & (df["Sender email"].str.endswith("@shop.com"))]

Or:

df1 = df[(df["Sender name"]=="Shop_name"]) & (df["Sender email"].str.contains("@shop\.com$"))]

edited Aug 03 '19 at 11:49

answered Aug 03 '19 at 11:47

jezrael

822,522
95
1,334
1,252

I notice that in Pandas 1.5.x, there is no regex option for `.endswith` but `.contains` will work. Should explicitly specify `regex=True`. In the source code, you can wander around to see what they do. I think implementation is using Python `re` `.search` method on a compiled string, so you could write your own list comprehension to do same work. – pauljohn32 Nov 21 '22 at 23:52

score 1 · Answer 2 · answered Aug 03 '19 at 11:57

Using `.query`

Since pandas >= 0.25.0 we can use .query with pandas methods (.eq & str.endswith) and using the backtick (`) to query column names with spaces:

df.query('`Sender name`.eq("Shop_name") & `Sender email`.str.endswith("@shop.com")')

Output

       Sender email Sender name
2    reply@shop.com   Shop_name
3      buy@shop.com   Shop_name
4  noreply@shop.com   Shop_name

Example dataframe used:

# Example dataframe
df = pd.DataFrame({'Sender email':['ex@example.com', 'ex2@example.com', "reply@shop.com", "buy@shop.com", "noreply@shop.com"],
                   'Sender name': ['example', 'example', 'Shop_name', 'Shop_name', 'Shop_name']})

       Sender email Sender name
0    ex@example.com     example
1   ex2@example.com     example
2    reply@shop.com   Shop_name
3      buy@shop.com   Shop_name
4  noreply@shop.com   Shop_name

How to use regexp in endswith() in conditional subsetting of pandas df column?

2 Answers2

Using `.query`

Linked

How to use regexp in endswith() in conditional subsetting of pandas df column?

2 Answers2

Using .query

Linked

Using `.query`