This gets me the count. But how to delete those files having count < 2?
$ cat ./a1esso.doc | grep -o -E '\w+' | sort -u -f | wc --words
1
$ cat ./a1brit.doc | grep -o -E '\w+' | sort -u -f | wc --words
4
How to grab the filenames of those that have less than 2, so we may delete them? I will be scanning millions of files. A find command can find all the files, but the filename needs to be propagated through the pipeline it seems. At the right end, the rm command can be used it seems.
Thanks for reading.
Update:
The correct answer is going to use an input pipeline to feed filenames. This is not negotiable. This program is not for use on the one input file shown in the example, but is coming from a dynamic list of many files.
A filter apparatus to identify the names of the files which are meeting the criterion, will also be present in the accepted answer. This is not negotiable either.