Problem:
I need to match an exact format for a mailing machine software program. It expects a certain format. I can count the number of new lines, carriage returns, tabs ...etc. using tools like
cat -vte
and
od -c
and
wc -l ( or wc -c )
However, I'd like to know the exact number of leading and trailing spaces between characters and sections of text. Tabs as well.
Question:
How would you go about analyzing then matching a template exactly using common unix tools + perl or python? One-liners preferred. Also, what's your advice for matching a DOS encoded file? Would you translate it to NIX first, then analyze, or leave, as is?
UPDATE
Using this to see individual spaces [ assumes no '%' chars in file ]:
sed 's/ /%/g' filename.000
Plan to build a script that analyzes each line's tab and space content.
Using @shiplu's solution with a nod to the anti-cat crowd:
while read l;do echo $l;echo $((`echo $l | wc -c` - `echo $l | tr -d ' ' | wc -c`));done<filename.000
Still needs some tweaks for Windows but it's well on it's way.
SAMPLE TEXT
Key for reading:
newlines marked with \n
Carriage returns marked with \r
Unknown space/tab characters marked with [:space:] ( need counts on those )
\r\n
\n
[:space:]Institution Anon LLC\r\n
[:space:]123 Blankety St\r\n
[:space:]Greater Abyss, AK 99999\r\n
\n
\n
[:space:] 10/27/2011\r\n
[:space:]Requested materials are available for pickup:\r\n
[:space:]e__\r[:space:] D_ \r[:space:] _O\r\n
[:space:]Bathtime for BonZo[:space:] 45454545454545[:space:] 10/27/2011\r\n
[:space:]Bathtime for BonZo[:space:] 45454545454545[:space:] 10/27/2011\r\n
\n
\n
\n
\n
\n
\n
[:space:] Pantz McManliss\r\n
[:space:] Gibberish Ave\r\n
[:space:] Northern Mirkwood, ME 99999\r\n
( untold variable amounts of \n chars go here )
UPDATE 2
Using IFS with read gives similar results to the ruby posted by someone below.
while IFS='' read -r line
do
printf "%s\n" "$line" | sed 's/ /%/g' | grep -o '%' | wc -w
done < filename.000