To eliminate redundancy in text files, I found Kdiff3 has the needed functionality -- just keep the uncolored text. But attempts to automerge flag text files as non-UTF-8 -- despite re-saving as UTF-8.
file -I FN.EXT
shows them to be binary.
I tried AWK and iconv, as follows.
awk '/[\x80-\xFF]/ { print }' test.txt
iconv -c -t ASCII 84-0.txt > test-2.txt
but it didn't convert to ASCII or UTF8. Iconv needs a recognizable input format. So I put together 3 lines of code to accomplish 3 conversions . .
Code:
1) from TXT to PDF (on MacOS).
2) from PDF to HTML.
3) from HTML to TXT.
as follows . .
cupsfilter test.txt > test.pdf 2> /dev/null
pdftohtml test.pdf test-2.html
textutil -convert txt test-2.html
This works but not in batch -- preferably on a nested folder. How is piping operation converted to find/exec? (Piping to {}.txt produced a file named "{}.txt".)
Filename is changed to avoid overwrite of original TXT file. Conversion to HTML creates 3 files, of which I use only 1.
Suggestions appreciated !!