4

I need some leads for tools in PHP and/or java (Spring + Hibernate currently) to use for hyphenation of content. I have some text content in included files and some in a database. All text is utf-8 encoded and I need soft hyphens as the support for that is common in most browsers.

So this stored original:

<p> These words need hyphenation</p>

would turn up something like this

<p> The&shy;se wor&shy;ds need hyp&shy;he&shy;na&shy;tion</p>

in the source of the finally loaded web page.

Any ideas how to achieve this?

Suggestions for text edit tools that includes hyphenation within HTML mark up would also be welcome for situations where there isn't any server-side code in use and only plain HTML source files.

Also, I have yet to find a good source for hyphenation word lists.

MiB
  • 575
  • 2
  • 10
  • 26
  • your adding hyphens in randomly? –  Nov 20 '12 at 19:25
  • Dagon, well actually that was just an example how it could look depending on the hyphenation rules of the language. In some languages at least there are several directions one could go on how to do proper hyphenation . With a proper list for the language it would look more accurate of course. – MiB Nov 21 '12 at 03:36

2 Answers2

5

CSS3 defines client-side hyphenation.

This means that in supporting browsers¹, you only need to specify the language of your text and your desire for automatic hyphenation and it will be hyphenated automatically without any work on your part. Obviously this means that hyphenation points are controlled by the browser's linguistic resources.

For manual control, you can place discretionary hyphens at every hyphenation point that you wish to use and direct the browser to use only those.

In practice, to find hyphenation points and insert discretionary hyphens, the best course would probably be to use the venerable TeX-style hyphenation method where subword patterns specifying hierarchical hyphenation or no-hyphenation points are matched against the word to hyphenate. These patterns are now widely used (including by OpenOffice, LibreOffice and Adobe InDesign) and are available for most languages.

Implementing the algorithm only takes a few lines of code. What's more, there are ready-made implementations in numerous languages: PHP implementations like phpHyphenator, Java implementations like TeXHyphenator-J or Hyphenation and Java bindings for the C++ implementation of libhyphen like jhyphen.

¹ Currently, Firefox, Safari and IE have autohyphenation support, Chrome and Opera don't.

Endre Both
  • 5,540
  • 1
  • 26
  • 31
2

Hyphenation is actually extremely difficult. There aren't really any word lists out there. If you're using PHP, you may be able to make the Perl library TeX::Hyphen. I don't know of any Java solutions.

For more information, read this Wikipedia article.

durron597
  • 31,968
  • 17
  • 99
  • 158
  • 1
    durron597, It would seem to me that soft hyphenation is very much needed in many cases to have good typography. Adobe Indesign does automatic hyphenation and must base its algorithms on something. OpenOffice has hyphenation dictionaries and I assumed they could be put to use perhaps. TeX was an interesting tip. I'll check it out. Thanks. – MiB Nov 21 '12 at 03:51
  • 1
    I saw a link on adobe InDesign while looking for an answer on this and the whole thread was like "yeah we don't know how it works, except that the words don't need to be English" – durron597 Nov 21 '12 at 05:09