Grep the last occurence of different elements in a big file

Question

I have a file where different elements are repeated on several lines. My file contains lines like this:

1  $element_(1)
10 $element_(2)
20 $element_(1)
30 $element_(3)
40 $element_(1)
50 $element_(2)
60 $element_(3)
70 $element_(1)

I want to get the last occurrence of each of these elements and put them in a file resultfile.

50 $element_(2)
60 $element_(3)
70 $element_(1)

I tried

for  i in {1..8000} do 
     grep $element_\($i\) sourcefile | tail -1 >> resultfile 
done

But it is giving me errors. Besides, how to make distinction between $ as part of the string name and $ to increment the number of the element I am searching for?

Also I don't know exactly how many elements I am going to have in the file so I took 8000 as a max value, but it can be less or more.

The last occurrence of each element in the file. I edited with a result above. — etudiant_is, Feb 26 '16 at 11:12
i don't want to sort based on the second column. Besides each element can be repeated like in a thousand line. I want to find the last time it appeared in the file for all the elements — etudiant_is, Feb 26 '16 at 11:19
basically the line inside the for loop works when I test it alone but when I try it inside the loop I get an error — etudiant_is, Feb 26 '16 at 11:54
try `for i in {1..3}; do grep "\$element_($i)" f1 | tail -1; done` — Fredrik Pihl, Feb 26 '16 at 11:56
Are the elements consecutive, i.e., if there is `element_(10)` but no `element_(11)`, do we know that we're done, or could there be gaps between element numbers? — Benjamin W., Feb 26 '16 at 14:50
Oh, and do the elements have to occur in the same order as in the input file? — Benjamin W., Feb 26 '16 at 14:52
Yes, the order of elements is as they appear but they don't appear necessarily in order. I don't need them sorted since I am going to keep each one id. — etudiant_is, Feb 26 '16 at 16:23

Benjamin W. · Accepted Answer · 2016-02-26T15:27:36.973

Output sorted by element index

You can tell grep to stop after finding the first match (-m 1), and to make this match the last in your file, you can pipe the file in reverse to grep:

for i in {1..8000}; do
    tac sourcefile | grep -m 1 "\$element_($i)"
done > resultfile

I've also moved the output redirection outside the loop, and fixed the quoting in your pattern: I quote the whole pattern; the first $ has to be escaped so the shell doesn't try to expand a variable $element_, and the parentheses must not be escaped or grep thinks it's a capture group. In your try, you correctly escaped them, but this is avoided here by quoting the whole pattern.

It's usually easier to single quote the pattern so we don't have to care about shell expansion, but in this case, we want $i to actually expand.

Your try had a syntax error in that the ; was missing after the braces.

Output sorted by order of appearance in input file

If the lines have to be in the same order as in the input file, we can prepend line numbers (nl) and sort by them in the end (sort -n) before removing them again with cut:

for i in {1..8000}; do
    nl sourcefile | tac | grep -m 1 "\$element_($i)"
done | sort -n | cut -f 2 > resultfile

Stop after first unsuccessful search

If we know that the element indices are contiguous and we can stop as soon as we don't find an element, we can tweak the loop as follows (still assuming we want to keep elements in order of appearance in the input file):

i=0
while true; do
    ((++i))
    nl sourcefile | tac | grep -m 1 "\$element_($i)" || break
done | sort -n | cut -f 2 > resultfile

This uses an increasing counter instead of a predetermined sequence. If the exit status of the pipe is non-zero, i.e., grep couldn't find the element, we exit the loop.

Thanks. yes the error was from the missing ; but I don't know why I have to put them, in the for loop syntax, they are not there. — etudiant_is, Feb 26 '16 at 16:21
If you check the [manual](https://www.gnu.org/software/bash/manual/bashref.html#Looping-Constructs), you'll see that the semicolon is optional when you loop over positional parameters, which you don't have to put explicitly (`for param do echo "$param"; done`), but in any other case, there either has to be a newline or a semicolon. — Benjamin W., Feb 26 '16 at 16:50

Grep the last occurence of different elements in a big file

1 Answers1

Output sorted by element index

Output sorted by order of appearance in input file

Stop after first unsuccessful search