awk select input files from list

Question

basic Awk question, but I can't seem to find an answer anywhere:

I have a folder of 50000 txt files, from which I would like to run AWK searches on a subset. I've saved the filenames I want to limit the search to in a separate document. This would greatly speed up the search, which at the moment looks like this:

awk -F "searchTerm" '{print NF-1}' data/output/*>> output.txt

Many thanks

sampson-chen · Accepted Answer · 2012-11-23T21:39:15.013

1

Suppose that your file containing the subset that you want to search is called subset.txt and its content has this format (each file on a separate line):

file1.txt
file2.txt
file3.txt
...
fileN.txt

Then this will do the trick:

awk -F "searchTerm" '{print NF-1}' $(<subset.txt) >> output.txt

Explanation:

$(<subset.txt) will supply the subset list of files to awk as input. (See Jonathan Leffler's comment below)

I should also point out that -F "searchTerm" is actually setting the Field Separator (limiter used by awk on each line) to searchTerm. If you want to print the Number of Fields - 1 on each line that contains "searchTerm", do:

awk '/searchTerm/ {print NF-1}' $(cat subset.txt) >> output.txt

edited Nov 23 '12 at 21:39

answered Nov 23 '12 at 21:03

sampson-chen

45,805
12
84
81

In `bash`, you can avoid the `cat` process with `$( – Jonathan Leffler Nov 23 '12 at 21:32
1

The above will fail for filenames containing spaces, globbing characters, etc., etc..... The right way to do it is `while IFS= read -r file; do awk '...' "$file"; done < subset.txt` – Ed Morton Nov 23 '12 at 22:22
@Ed `while read...` while fail on filenames containing newlines. – William Pursell Nov 24 '12 at 01:26
@WilliamPursell Absolutely. Diminishing returns.... but you're right I should have said that. – Ed Morton Nov 24 '12 at 01:41

score 0 · Answer 2 · answered Nov 23 '12 at 21:05

0

I think this will work for you.

awk '/searchTerm/{print $(NF-1)}' data/output/*>> output.txt

answered Nov 23 '12 at 21:05

ddoxey

2,013
1
18
25

score 0 · Answer 3 · answered Nov 10 '15 at 23:37

0

if you have your lists in a file called filelist.txt you could just use the stdout from a cat command.

 awk -F "searchTerm" '{print NF-1}' `cat data/output/filelist.txt` >> output.txt`

answered Nov 10 '15 at 23:37

jeffpkamp

2,732
2
27
51

awk select input files from list

3 Answers3