how to give multiple values in substring condition used in awk

Question

My input file has below values,

93401140232
93403047213
93443757143
93451675101
93473360818
93451495679
93403707762
93403966944
93441513846
93472692526

I want to result only the "93441","93443","93451","93472" prefixes from the list. i thought of using below command,

awk -F, 'substr($1,0,5) in ("93441","93443","93451","93472") {print $0}' sample.txt

my command is not running. I think, I can not use "in" operator in awk. What is the effective alternative for this?

`awk '/^93441|^93443|^93451|^93472/'` – dawg Nov 14 '20 at 16:55 — dawg, Nov 14 '20 at 16:55

hek2mgl · Answer 1 · 2020-11-14T11:46:50.023

I suggest the following solution, passing the numbers as a comma separated string and parsing them into an array in the BEGIN bock, at the beginning of the script:

awk -F, -v s=93441,93443,93451,93472 \
    'BEGIN{split(s,a);for(i in a){b[a[i]]}} substr($1,1,5) in b' input.txt

Note: string and array indicies start 1 in awk, not 0. Therefore it's substr($1, 1, 5)

Another common solution is to process two files: the input file and a file which contains the search values:

search.txt

Then use the following awk command:

awk 'NR==FNR{a[$0];next} substr($1,1,5) in a' search.txt input.txt

Ed Morton · Answer 2 · 2020-11-14T14:15:26.643

2

For a large data set I'd use what @hek2mgl suggests as it should be the faster solution, otherwise this might be what you were trying to do:

$ awk 'index("93441,93443,93451,93472",substr($1,1,5))' file
93443757143
93451675101
93451495679
93441513846
93472692526

edited Nov 14 '20 at 14:15

answered Nov 14 '20 at 13:28

Ed Morton

188,023
17
78
185

how to give multiple values in substring condition used in awk

2 Answers2