0

I have a column, with some words:

scatman
batman
superman
scatman
scatman
batman
superman
scatman
scatman
batman
superman
scatman
batman
WWWWWWWW
superman
scatman
batman
superman
scatman

I should make some patterns, where i need to have word by word three words: scatman, batman, superman. Where, i have repeat words, like scatman & scatman on line 4 & 5 or where i have other words, i should cut them I have written:

grep "scatman\|batman\|superman" file

Ok, i have rejected word WWWWWWWW, but i can't understand how to show my column word by word. Ihave in result:

scatman
batman
superman
scatman
scatman
batman
superman
scatman
scatman
batman
superman
scatman
batman
superman
scatman
batman
superman
scatman

At line 4 & 5 i have repeat words, but i don't like that. Where i have the mistake ?

Valeriu
  • 57
  • 2
  • 9

3 Answers3

0

greping something

At line 4 & 5 i have repeat words, but i don't like that

to omit repeated lines: add | uniq at the end of your command

yvs2014
  • 111
  • 2
0

This will do exactly as you want

#!/bin/bash
array=(
[0]="scatman"
[1]="batman"
[2]="superman"
)
count=0
while read line; do
    for i in "${array[@]}";
    do
    if [[ $count == 3  ]]; then
    count=0
    fi
    if [[ $line == ${array[$count]} ]]; then
    #echo "$line"
    printf "%s " $line

    # uncomment if you want every word on a new line
    #echo "$line" >> newfile.txt # each word on a line
    #or
    # uncomment if you want all will be on one line
    #printf "%s " $line >> newfile.txt

    count=$((count+1))
    else
    continue
    fi

    done

done < file.txt #this is your original file

Will print out every word on a new line like this:

scatman
batman
superman
scatman
batman
superman
scatman
batman
superman
scatman
batman
superman
scatman
batman
superman
scatman

or on a one line like this:

scatman batman superman scatman batman superman scatman batman superman scatman batman superman scatman batman superman scatman
Talal Al-Khalifa
  • 668
  • 5
  • 12
  • i want to output: scatman batman superman scatman batman superman scatman batman superman scatman batman superman scatman batman superman. where are repeat scatman scatman, i should cut this word. or where are another words – Valeriu Jan 20 '17 at 21:48
  • you mean repeat only the three words ? again and again ? or you mean only show where these words are repeated? so you can delete the repeat? – Talal Al-Khalifa Jan 20 '17 at 22:07
  • @valeriu This command will sort text and print them to a new file: `sort testt.txt | uniq >> test.txt` – Talal Al-Khalifa Jan 20 '17 at 22:14
  • @valeriu I think I understand you now. You need only these words after each other in order scatman batman superman scatman batman superman and not scatman scatman batman ... I will try something – Talal Al-Khalifa Jan 20 '17 at 22:25
  • @valeriu the above code will only use the 3 words that you want and ignore the rest – Talal Al-Khalifa Jan 20 '17 at 23:30
0

This can be done with grep and awk:

cat words.txt |
grep -E 'scatman|batman|superman' |
awk '{
      last_word = cur_word
      cur_word = $0
      if (cur_word == last_word)
        next
      else
        print $0
      }' < word.list 

The grep -E allows an extended Regular Expression that lets you use | as an or for search targets. The awk code looks for repeated words and skips them.

You can do all this in one line, if you want to:

cat words.txt | grep -E 'scatman|batman|superman' | awk '{ last_word = cur_word; cur_word = $0; if (cur_word == last_word) next; else print $0 }' < word.list`
Greg Tarsa
  • 201
  • 2
  • 5