0

How can I filter out some words from a line of text using command line tools?

Example:

line 1: All's Well That Ends Well
filter: That Well
output: Well That Well

Notice: a double occurrence still appears twice after the filtering.

Apart from a line of GNU utilities, I could also employ a Python script.

Robottinosino
  • 10,384
  • 17
  • 59
  • 97

4 Answers4

2

You can send (pipe) the text into grep like this:

echo "All's Well That Ends Well" | grep -o '\(That\|Well\)'
octopusgrabbus
  • 10,555
  • 15
  • 68
  • 131
Eric Fortis
  • 16,372
  • 6
  • 41
  • 62
  • Accepts longer words, such as `Wellness` too. – eumiro Jun 11 '12 at 18:42
  • I did not know about -o! So, this should work to match whole words? echo "All's Well That Ends Well" | grep -o '\(\bThat\b\|\bWell\b\)' | tr '\n' ' '; printf "\n" -- The tr + printf looks clunky though. How to improve? – Robottinosino Jun 11 '12 at 18:48
2

Add \b to match word boundaries too. Longer words (such as Wellness) will be rejected.

echo "All's Well That Ends Well" | grep -o '\(\bThat\b\|\bWell\b\)'
eumiro
  • 207,213
  • 34
  • 299
  • 261
  • As I commented above. Your modified solution satisfies the requirement. I'll wait to accept to see if there's a more efficient way to do this... but this looks correct to me! I learned about (-o) ! – Robottinosino Jun 11 '12 at 18:51
0
>>> l="All's Well That Ends Well"
>>> k=['Well','That']
>>> [w for w in l.split() if w in k]

How do I do this using shell scripting?

Robottinosino
  • 10,384
  • 17
  • 59
  • 97
0

Here's an idea:

line = "All's Well That Ends Well"
filter = "That Well"

print [word.lower() for word in line.split() if word.lower() in filter.split()]

That last line is called a list comprehension, and is very "pythonic." split() makes any string into a list of words where each item in the list is determined by whitespace between words. I added lower() so that it would return words in lowercase, ignoring the original case.

Paul Whalen
  • 453
  • 6
  • 21