4

What I would like to do is the following.

Text file content :

This is a simple text file
containing lines of text
with different width
but I would like to justify
them. Any idea ?

Expected result :

This is a simple text file containing
lines of  text with  different  width
but I  would  like  to  justify them.
Any Idea ?

I already can split my files at the required width using :

cat textfile|fmt -s -w 37

But in that case, there is no justification...

EDIT : Using par as suggested, I found a problem with accented chars. This is what gives par 37j1 for me :

This   is   à  simplé   text   file
containing   lines   of  tèxt   with
different wïdth but I woùld like to
justîfy them. Any idéà ?

Not justified anymore... But spaces are altered anyway...

Thanks for your help,

Slander

Slander
  • 225
  • 2
  • 13
  • Maybe, you can provide an REAL text - your sample input was only ascii - so, why suprised? – clt60 Mar 22 '14 at 19:15
  • Well I am French but I provided english text on an english website. I've done that to be clearer and understood by english readers. Next time I will provide a French text, I promise ;) – Slander Mar 22 '14 at 19:54
  • Not a language is important but the text encoding. For exampe you can use Klingonian language too, until it will be utf8 encoded - for example... :) – clt60 Mar 22 '14 at 19:56
  • I use utf8 encoded text. Is it the problem ? I don't understand, as when doing `VAR="abc"; echo ${#VAR}` and `VAR="àbc"; echo ${#VAR}`, both give me 3, which is correct. No encoding thing here... What did I miss ? – Slander Mar 22 '14 at 20:12

2 Answers2

4

You can employ nroff as using it man.

(echo '.ll 37'
 echo '.pl 0'
 cat orig.txt) | nroff

from your input produces:

This is a simple text file containing
lines of text  with  different  width
but I would like to justify them. Any
idea ?

The above WORKS ONLY WITH ASCII.

EDIT

If you want handle utf8 text with a nroff, you can try the next:

cat orig.txt | (        #yes, i know - UUOC
    echo '.ll 37'     #line length
    echo '.pl 0'      #page length (0-disables empty lines)
    echo '.nh'        #no hypenation
    preconv -e utf8 -
) | groff -Tutf8

From this utf8 encoded input:

Voix ambiguë d'un cœur qui au zéphyr préfère les jattes de kiwi.
Voyez le brick géant que j'examine près du wharf.
Monsieur Jack, vous dactylographiez bien mieux que votre ami Wolf.
Eble ĉiu kvazaŭ-deca fuŝĥoraĵo ĝojigos homtipon..
Laŭ Ludoviko Zamenhof bongustas freŝa ĉeĥa manĝaĵo kun spicoj.
Nechť již hříšné saxofony ďáblů rozezvučí síň úděsnými tóny waltzu, tanga a
quickstepu.

produces:

Voix  ambiguë d’un cœur qui au zéphyr
préfère les jattes de kiwi.  Voyez le
brick  géant  que  j’examine  près du
wharf.     Monsieur    Jack,     vous
dactylographiez  bien mieux que votre
ami  Wolf.   Eble   ĉiu   kvazaŭ‐deca
fuŝĥoraĵo   ĝojigos  homtipon..   Laŭ
Ludoviko  Zamenhof  bongustas   freŝa
ĉeĥa  manĝaĵo  kun spicoj.  Nechť již
hříšné saxofony ďáblů  rozezvučí  síň
úděsnými   tóny   waltzu,   tanga   a
quickstepu.

If you delete the line

echo '.nh'   #no hypenation

you will get hypenated text

Voix  ambiguë d’un cœur qui au zéphyr
préfère les jattes de kiwi.  Voyez le
brick  géant  que  j’examine  près du
wharf.  Monsieur Jack, vous  dactylo‐
graphiez  bien  mieux  que  votre ami
Wolf.  Eble ĉiu kvazaŭ‐deca fuŝĥoraĵo
ĝojigos  homtipon..  Laŭ Ludoviko Za‐
menhof bongustas freŝa  ĉeĥa  manĝaĵo
kun  spicoj.   Nechť již hříšné saxo‐
fony  ďáblů  rozezvučí  síň  úděsnými
tóny waltzu, tanga a quickstepu.
clt60
  • 62,119
  • 17
  • 107
  • 194
  • **You rock man !** I was studying the `groff` man page and you posted it at the same time. Thanks a lot. – Slander Mar 22 '14 at 21:14
1

You could use par:

par -j -w37 < inputfile
  • The -j option would justify paragraphs.
  • -w denotes max output line length.

For your input, it'd produce:

This is a simple text file containing
lines  of text  with different  width
but I would like to justify them. Any
idea ?

An alternative would be to use emacs:

emacs -batch inputfile --eval '(set-fill-column 37)' --eval '(fill-region (point-min) (point-max))' -f save-buffer

This would also produce:

This is a simple text file containing
lines of text with different width
but I would like to justify them. Any
idea ?
devnull
  • 118,548
  • 33
  • 236
  • 227
  • Thanks for your reply, it seems `par` would do the trick... Unfortunately this command is unknown on my system... Have to find out how to get it. – Slander Mar 22 '14 at 18:16
  • @Slander Which platform are you using? Nevertheless, adding another alternative that might work for you. – devnull Mar 22 '14 at 18:26
  • Ok get it. Thanks but `par` is not working with accented characters ! In fact it counts the accented characters as 2 or 3 chars and justifying them doesn't work... There is nothing in the man page refering to that matter... See my update. – Slander Mar 22 '14 at 18:56