0

To eliminate redundancy in text files, I found Kdiff3 has the needed functionality -- just keep the uncolored text. But attempts to automerge flag text files as non-UTF-8 -- despite re-saving as UTF-8.

file -I FN.EXT shows them to be binary. I tried AWK and iconv, as follows.

awk '/[\x80-\xFF]/ { print }' test.txt
iconv -c -t ASCII 84-0.txt > test-2.txt

but it didn't convert to ASCII or UTF8. Iconv needs a recognizable input format. So I put together 3 lines of code to accomplish 3 conversions . .

Code:

1) from TXT to PDF (on MacOS).
2) from PDF to HTML.
3) from HTML to TXT.

as follows . .

cupsfilter test.txt > test.pdf  2> /dev/null
pdftohtml test.pdf test-2.html
textutil -convert txt test-2.html

This works but not in batch -- preferably on a nested folder. How is piping operation converted to find/exec? (Piping to {}.txt produced a file named "{}.txt".)

Filename is changed to avoid overwrite of original TXT file. Conversion to HTML creates 3 files, of which I use only 1.

Suggestions appreciated !!

Compo
  • 36,585
  • 5
  • 27
  • 39

0 Answers0