0

I am new to CSS and web development in general. Hopefully there is a way to accomplish what I am trying to do. What I am trying to do is simple to explain, but I need to give some background info first, sorry for the length of the post.

I have created a webpage that is in the Tibetan language. Tibetan does not have spaces between words, it only has a character called a "tsheg" (་ - U+0F0B) that is used to separate every syllable. It also has a mark called a "shey" (། - U+0F0D) that comes at the end of phrases and clauses and sentences. Although sometimes it is doubled, after a shey is generally a space before the next line of text. When typing in Tibetan this space is represented not as a normal space (U+0020) but instead U+00A0, however when it comes to browsers and HTML/coding in general these two seem to behave the same.

In any Tibetan writing, the ideal aesthetic is for full justification. Traditionally there would be slight spaces placed between the tsheg marks and the shey marks to achieve a perfectly flush left and right alignment. (The exception would be the last line of a text, or a paragraph in contemporary formatting, does not need to be justified). It is acceptable for lines to break mid-word or mid-sentence, but never mid syllable. So the last character on any line is going to be either a tsheg or a shey. It is also not acceptable to start a line with a shey. In the last few years this has been easy to achieve for desktop publishing using MS Word, using "Thai Justification." However that option is not available even in other Office products, never mind outside of the Office environment. Other work-arounds have been to add invisible width characters after every tsheg and shey, allowing for wrapping at any point.

Now comes the question and difficulty. I am using distributed justification, and that seems to be the best option. It does not break syllables up, which is important. But it only wants to break at those spaces after shey marks, and it breaks elsewhere when there is a long string of text without a space, but if there is a space then it breaks there, sometimes stretch one or two syllables across an entire line, which is obviously not ideal.

Now, when coding the HTML of the text I can use the same work-around that is used for desktop publishing pre "Thai justification," I can add a <wbr> after every single tsheg, and this will not be visible to the end user and should allow cleaner breaking. However, there are two problems with this. But inserting that many <wbr> characters I am essentially doubling, or close to doubling, my character count, which can make the page take twice as long to load, even if half of those characters are invisible. However, more important is that it disrupts search functionality. Although you may see the word that has the syllables "AB" for instance, if you tried searching for AB you wouldn't find it, because the HTML sees "AB". And being able to search is kind of critical. Enough so that an ugly formatting is preferable to losing the ability to search and to be indexed properly. Obviously, since I need the site to be responsive and I do not know what size screens will be used I cannot have forced line breaks, either, another trick used when publishing.

So, finally, my question. Is there a way I can define a style or function or some sort of element that automatically associates a certain character--in my case the tsheg character--as having a <wbr> command after it without actually needing to input that command into my HTML? So when the text is justified it treats every tsheg as a <wbr>? I have a class .Tibetan in my stylesheet that defines the font and the justification and so forth, is there some way I can add some code there that achieves what I am looking for?

The one other thing I tried was replacing all of the spaces with &nbsp; which gave a beautiful justified appearance but it also caused the browser to disregard the tsheg marks entirely and it allowed for the cutting in half of syllables.

If you want to see an example of what I am talking about you can visit this page of my site: http://publishing.simplebuddhistmonk.net/index.php/downloads/critical-editions/ and next to the word "English" click the Tibetan characters and that will bring up a paragraph of prose, or you can look here: http://publishing.simplebuddhistmonk.net/index.php/downloads/tibetan/essence-of-dispelling-errors-tib/ (though the formatting on that latter page is less egregious than the former, at least on my screen).

EDIT It looks like the solution this person used might be able to be adapted for my use: Dynamically add <wbr> tag before punctuation however I do not actually understand what I would need to add, and where, to make that work for me. Anyone think that might apply to this scenario? And if so, what code would I add where?

NEW EDIT So, I think the problem might be with the search function that comes from my WordPRess theme. I used my workaround as mentioned above, adding the tag after every tsheg, on this page: http://publishing.simplebuddhistmonk.net/index.php/downloads/tibetan/essence-of-dispelling-errors-tib/ and as you can see, it displays perfectly. But if you search for any phrase from that page using the search function that is up in my header, it will not find it. If you do a Ctrl+F and search on the page, though it will find it. Even if you copy the text from the page and paste it into the search box it still does not find it. Copy the text into a word editor doesn't reveal any hidden or invisible characters. However, if you search for a term from this page http://publishing.simplebuddhistmonk.net/index.php/downloads/tibetan/beautiful-garland-ten-innermost-jewels-tib/ which I have not added the tags to, you will see that it finds it no problem.

So, that leads me to believe the error is in the search function. Any experience with this? Because search is important but I can quite possibly find alternative earch widgets to replace the one that comes with the theme. What is most important though is if you search for a line of text on Google it needs to be found. My site has not been indexed fully by any search engine so I cannot yet confirm if this does or does not affect them.

So.... At this point I wil take any advice I can get. Any advice regarding the original question (is there a way to tell the style guide "if your are displaying X then treat it like X" ) or any idea about this issue with the search functionality, and how the tag may or may not affect search, both from within the site and also from search engines.

Oberon
  • 31
  • 3
  • I have zero knowledge of Tibetan, but I noticed that on the site the attribute `lang="bo"` is missing in many places. – Mr Lister Jan 12 '18 at 09:50
  • I thought that a language signifier would allow me to define style guidelines in the stylesheet, and I am already defining those using a class. Does it make a difference? I can easily add the language attribute within the Div that I already use to contain all of the Tibetan language sections. So that is easy enough, I just didn't think it would do anything... I actually haven't seen any Tibetan language sites that solves this justification problem, most use auto justificaion and are left with a ragged right margin. – Oberon Jan 12 '18 at 10:26
  • Language attributes do have a meaning to the browser. For instance, the [MDN page on hyphens](https://developer.mozilla.org/en-US/docs/Web/CSS/hyphens) says "You must specify a language using the `lang` HTML attribute in order to guarantee that automatic hyphenation is applied in the language of your choice." But like I said, zero knowledge, which includes zero knowledge about how browsers respond to tsheg and shey in different languages. – Mr Lister Jan 12 '18 at 11:20
  • Thanks. I mean, I have gone and added it, so all of the Tibetan language sections are contained within a
    , which will be a useful signifier for search engines even if it doesn't affect the formatting.
    – Oberon Jan 12 '18 at 11:36

0 Answers0