4

I have file1.txt with content:

rs002
rs113
rs209
rs227
rs151 
rs104

I have file2.txt with content:

rs113   113
rs002   002
rs227   227
rs209   209
rs104   104
rs151   151

I want to get the lines of file2.txt that match the records in file1.txt, for which I tried:

grep -Fwf file1.txt file2.txt 

with output as follows:

rs113   113
rs002   002
rs227   227
rs209   209
rs104   104
rs151   151

This extracts all the matching lines, but it is in the order of occurrence in file2.txt. Is there any way to extract the matching records while maintaining the order from file1.txt? The desired output is as follows:

rs002   002
rs113   113
rs209   209
rs227   227
rs151   151
rs104   104
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
  • Have you tried reversing the arguments - `grep -Fwf file2.txt file1.txt` – adarshr Apr 09 '16 at 18:43
  • @adarshr That won't work. What this grep command does is basically use the first file as the patterns you're looking for and the second file as the file in which you're looking for the patterns. As far as I know, you can't trick the sorting order just by using the grep command. Maybe awk or comm could help (not sure). – randombee Apr 09 '16 at 20:10
  • @adarshr tried reversing the files, but as user--randombee said, the first file is the one with particular pattern that we want the second file to follow while subsetting. – reneesummer Apr 09 '16 at 20:17

4 Answers4

2

One (amittedly not very elegant) solution is to loop over file1.txt and look for a match for each line:

while IFS= read -r line; do
    grep -wF "$line" file2.txt
done < file1.txt

which gives the output

rs002   002
rs113   113
rs209   209
rs227   227
rs151   151
rs104   104

If you know that each line occurs only once at most, this can be accelerated a bit by telling grep to stop after the first match:

grep -m 1 -wF "$line" file2.txt

This is a GNU extension, as far as I can tell.

Notice that looping over a file to do some processing on another file in each loop usually is a sign that there is a much more efficient way to do things, so this should probably only be used for files small enough where the effort of coming up with a better solution takes longer than processing them with this solution.

Community
  • 1
  • 1
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
  • I will come back and vote yes once I reach 15 reputation! – reneesummer Apr 09 '16 at 21:57
  • and there's the danger of posting a solution like this - that a newbie might actually think it's the right approach rather than just an interesting anecdote. reneesummer please read http://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice as Benjamin suggested and then accept [@Thor's answer](http://stackoverflow.com/a/36524323/1745001) which I'm sure Benjamin would agree is the right solution. – Ed Morton Apr 10 '16 at 16:33
2

This is too complicated for grep. If file2.txt is not huge, i.e. it fits into memory, you should probably be using awk:

 awk 'FNR==NR { f2[$1] = $2; next } $1 in f2 { print $1, f2[$1] }' file2.txt file1.txt

Output:

rs002 002
rs113 113
rs209 209
rs227 227
rs151 151
rs104 104
Thor
  • 45,082
  • 11
  • 119
  • 130
0

Create a sed-command file from file2

 sed 's#^\([^ ]*\)\(.*\)#/\1/ s/$/\2/#' file2 > tmp.sed
 sed -f tmp.sed file1

These 2 lines can be combined avoiding the tmp file

sed -f <(sed 's#^\([^ ]*\)\(.*\)#/\1/ s/$/\2/#' file2) file1
Walter A
  • 19,067
  • 2
  • 23
  • 43
-1

That should help (but will not optimal for big input):

$ for line in `cat file1.txt`; do grep $line file2.txt; done
Vadim Key
  • 1,242
  • 6
  • 15
  • 2
    Plese use `"$line"` for lines with more than 1 word. – Walter A Apr 10 '16 at 15:00
  • Entirely the wrong approach with many issues. See http://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice – Ed Morton Apr 10 '16 at 16:39