Filtering DataFrame by finding exact word (not combined) in a column of strings

Question

My DataFrame has two columns:

Name  Status
 a    I am Good
 b    Goodness!!!
 c    Good is what i feel
 d    Not Good-at-all

I want to filter rows in which Status has a string 'Good' as its exact word, not combined with any other words or characters.

So the output will be:

Name  Status
a    i am Good
c    Good is what i feel

Two other rows had a 'Good' string in it but mixed with other characters, so should not be picked up.

I tried doing:

d = df[df['Status'].str.contains('Good')]  # But all rows come up

I believe some regex like (r'\bGood\b', Status) will do that, but this is not able to sum it up together. And how/where exactly can I fit the regex in a DataFrame filter condition to achieve this? And how to achieve startswith or endswith 'Good' (exact word search)?

score 10 · Accepted Answer · answered May 26 '16 at 10:17

10

If you're defining "exact" to mean no other characters (including punctuation which defines a word boundary \b), you could instead check for a leading and trailing space and/or beginning/end anchors:

>>> df[df['Status'].str.contains(r'(?:\s|^)Good(?:\s|$)')]
  Name               Status
0    a            I am Good
2    c  Good is what i feel

Explanation:

(?:\s|^) is a non-capturing group looking for a space character (\s) or the beginning of the string (^).
Good is the word you're looking for.
(?:\s|$) is a non-capturing group looking for a space character (\s) or the end of the string ($).

answered May 26 '16 at 10:17

Alex Riley

169,130
45
262
238

@ajcr-Thanks , i am accepting your answer, but as you mentioned with exact i can include some special chars like ! ?...So i am curious about how i will allow some/few chars combined with 'Good' like "Good`s" or 'Good!'. Can u please help. – Satya May 26 '16 at 10:38
You could make a character class of characters that are allowed to follow 'good', for example `! ? '`. The second character class would then be `"(?:[\s!?']|$)"`. – Alex Riley May 26 '16 at 10:49
@AlexRiley Can you help me. I want to match only the specific string in rows. I have column with strings and I only want to count that exact string with regex? From the above answer it skips if it found `good.` – Sai Kumar Sep 20 '18 at 13:34

score 2 · Answer 2 · answered Mar 29 '21 at 10:57

Based on a similar question/answer, you can also built your query dynamically as follows:

base = r'^{}'
expr = '(?:\s|^){}(?:,\s|$)'
string='Good'
df[df['Status'].str.contains(base.format(''.join(expr.format(string))))]

This has really helped me when filtering dataframes.

score 0 · Answer 3 · edited Mar 18 '21 at 15:21

0

If you are looking at normal text with commas in, it can also be useful to add the comma in as below.

df[df['Status'].str.contains(r'(?:\s|^)Good(?:,\s|$)')]

edited Mar 18 '21 at 15:21

Syscall

19,327
10
37
52

answered Mar 18 '21 at 15:21

LUFC999

1
2

Filtering DataFrame by finding exact word (not combined) in a column of strings

3 Answers3

Linked

Related