2

My input file has below values,

93401140232
93403047213
93443757143
93451675101
93473360818
93451495679
93403707762
93403966944
93441513846
93472692526

I want to result only the "93441","93443","93451","93472" prefixes from the list. i thought of using below command,

awk -F, 'substr($1,0,5) in ("93441","93443","93451","93472") {print $0}' sample.txt

my command is not running. I think, I can not use "in" operator in awk. What is the effective alternative for this?

Cyrus
  • 84,225
  • 14
  • 89
  • 153
ramki_ramakrishnan
  • 169
  • 1
  • 1
  • 9

2 Answers2

2

I suggest the following solution, passing the numbers as a comma separated string and parsing them into an array in the BEGIN bock, at the beginning of the script:

awk -F, -v s=93441,93443,93451,93472 \
    'BEGIN{split(s,a);for(i in a){b[a[i]]}} substr($1,1,5) in b' input.txt

Note: string and array indicies start 1 in awk, not 0. Therefore it's substr($1, 1, 5)


Another common solution is to process two files: the input file and a file which contains the search values:

search.txt

93441
93443
93451
93472

Then use the following awk command:

awk 'NR==FNR{a[$0];next} substr($1,1,5) in a' search.txt input.txt
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
2

For a large data set I'd use what @hek2mgl suggests as it should be the faster solution, otherwise this might be what you were trying to do:

$ awk 'index("93441,93443,93451,93472",substr($1,1,5))' file
93443757143
93451675101
93451495679
93441513846
93472692526
Ed Morton
  • 188,023
  • 17
  • 78
  • 185