Command line filtering of words in a line of text

Question

How can I filter out some words from a line of text using command line tools?

Example:

line 1: All's Well That Ends Well
filter: That Well
output: Well That Well

Notice: a double occurrence still appears twice after the filtering.

Apart from a line of GNU utilities, I could also employ a Python script.

One idea would be to play around with Python regular expressions and parse that way. — octopusgrabbus, Jun 11 '12 at 18:39

score 2 · Answer 1 · edited Jun 11 '12 at 23:20

2

You can send (pipe) the text into grep like this:

echo "All's Well That Ends Well" | grep -o '\(That\|Well\)'

edited Jun 11 '12 at 23:20

octopusgrabbus

10,555
15
68
131

answered Jun 11 '12 at 18:39

Eric Fortis

16,372
6
41
62

Accepts longer words, such as `Wellness` too. – eumiro Jun 11 '12 at 18:42
I did not know about -o! So, this should work to match whole words? echo "All's Well That Ends Well" | grep -o '\(\bThat\b\|\bWell\b\)' | tr '\n' ' '; printf "\n" -- The tr + printf looks clunky though. How to improve? – Robottinosino Jun 11 '12 at 18:48

score 2 · Accepted Answer · answered Jun 11 '12 at 18:44

2

Add \b to match word boundaries too. Longer words (such as Wellness) will be rejected.

echo "All's Well That Ends Well" | grep -o '\(\bThat\b\|\bWell\b\)'

answered Jun 11 '12 at 18:44

eumiro

207,213
34
299
261

As I commented above. Your modified solution satisfies the requirement. I'll wait to accept to see if there's a more efficient way to do this... but this looks correct to me! I learned about (-o) ! – Robottinosino Jun 11 '12 at 18:51

score 0 · Answer 3 · answered Jun 11 '12 at 18:43

0

>>> l="All's Well That Ends Well"
>>> k=['Well','That']
>>> [w for w in l.split() if w in k]

How do I do this using shell scripting?

answered Jun 11 '12 at 18:43

Robottinosino

10,384
17
59
97

**Well,** this sentence wouldn't be accepted because of the comma. – eumiro Jun 11 '12 at 18:45

score 0 · Answer 4 · answered Jun 11 '12 at 18:46

0

Here's an idea:

line = "All's Well That Ends Well"
filter = "That Well"

print [word.lower() for word in line.split() if word.lower() in filter.split()]

That last line is called a list comprehension, and is very "pythonic." split() makes any string into a list of words where each item in the list is determined by whitespace between words. I added lower() so that it would return words in lowercase, ignoring the original case.

answered Jun 11 '12 at 18:46

Paul Whalen

453
6
21

`.split()` is bad at treating commas within sentences. – eumiro Jun 11 '12 at 18:46
True, there are a lot things to consider if this problem is to cover a lot of cases. – Paul Whalen Jun 11 '12 at 18:48
So, how do I do it better then? – Robottinosino Jun 11 '12 at 18:50
Here's a [question](http://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string-in-python) about removing punctuation from a string. You should do that before the list comprehension line. – Paul Whalen Jun 11 '12 at 18:56

Command line filtering of words in a line of text

4 Answers4