grep matching specific position in lines using words from other file

Question

I have 2 file

file1:

12342015010198765hello
12342015010188765hello
12342015010178765hello

whose each line contains fields at fixed positions, for example, position 13 - 17 is for account_id

file2:

98765
88765

which contains a list of account_ids.

In Korn Shell, I want to print lines from file1 whose position 13 - 17 match one of account_id in file2.

I can't do

grep -f file2 file1

because account_id in file2 can match other fields at other positions.

I have tried using pattern in file2:

^.{12}98765.*

but did not work.

John1024 · Accepted Answer · 2015-07-10T04:59:28.247

Using awk

$ awk 'NR==FNR{a[$1]=1;next;} substr($0,13,5) in a' file2 file1
12342015010198765hello
12342015010188765hello

How it works

NR==FNR{a[$1]=1;next;}

FNR is the number of lines read so far from the current file and NR is the total number of lines read so far. Thus, if FNR==NR, we are reading the first file which is file2.

Each ID in in file2 is saved in array a. Then, we skip the rest of the commands and jump to the next line.
substr($0,13,5) in a

If we reach this command, we are working on the second file, file1.

This condition is true if the 5 character long substring that starts at position 13 is in array a. If the condition is true, then awk performs the default action which is to print the line.

Using grep

You mentioned trying

grep '^.{12}98765.*' file2

That uses extended regex syntax which means that -E is required. Also, there is no value in matching .* at the end: it will always match. Thus, try:

$ grep -E '^.{12}98765' file1
12342015010198765hello

To get both lines:

$ grep -E '^.{12}[89]8765' file1
12342015010198765hello
12342015010188765hello

This works because [89]8765 just happens to match the IDs of interest in file2. The awk solution, of course, provides more flexibility in what IDs to match.

Jahid · Answer 2 · 2015-07-10T05:35:11.027

1

Using sed with extended regex:

sed -r 's@.*@/^.{12}&/p@' file2 |sed -nr -f- file1

Using Basic regex:

sed 's@.*@/^.\\{12\\}&/p@' file1 |sed -n -f- file

Explanation:

sed -r 's@.*@/^.{12}&/p@' file2

will generate an output:

/.{12}98765/p
/.{12}88765/p

which is then used as a sed script for the next sed after pipe, which outputs:

12342015010198765hello
12342015010188765hello

edited Jul 10 '15 at 05:35

answered Jul 10 '15 at 05:28

Jahid

21,542
10
90
108

score 0 · Answer 3 · answered Aug 07 '20 at 11:28

0

Using Grep

The most convenient is to put each alternative in a separate line of the file.

You can look at this question:

grep multiple patterns single file argument list too long

answered Aug 07 '20 at 11:28

Juan Miguel Díaz Pérez

161
2
3

grep matching specific position in lines using words from other file

3 Answers3

Using awk

How it works

Using grep

Linked