1

I have a working docx generator which works fine for European languages, and I'm trying to add complex script support. I found another question with some recipes to try: python-docx add_style with CTL (Complex text layout) language

I managed to get it working so that complex-script text comes out in the correct typeface and size, but I can't get bidirectional (right-to-left) text working. The obvious "x.font.rtl = True" doesn't work, and neither does the spell given in the other post ("lang.set(qn('w:bidi'),'fa-IR')"). I had to take out the line " "rpr.get_or_add_sz()" from his recipe, which left me with an unreadable file, but everything else works without it and I don't think that it's related to this problem.

Here is the style as it appears in the generated document's styles.xml file:

<w:style w:styleId="Hebrew" w:type="paragraph" w:customStyle="1">
    <w:name w:val="Hebrew"/>
    <w:basedOn w:val="Normal"/>
    <w:pPr>
        <w:jc w:val="right"/>
    </w:pPr>
    <w:rPr>
        <w:rFonts w:cs="Arial"/>
        <w:rtl/>
        <w:szCs w:val="24"/>
        <w:lang w:bidi="he-IL"/>
    </w:rPr>
</w:style>

Can anyone advise me on what to do to get paragraphs in right-to-left languages working?

kjhughes
  • 106,133
  • 27
  • 181
  • 240
user1636349
  • 458
  • 1
  • 4
  • 21
  • 1
    I haven't done anything with python-docx shortly after that post (almost 2 years) so I don't remember much about it (it might have changed the way it does some things since then even). But comparing my xml in "Xml explanation" part with yours, I can see that you don't have a "w:val" before "w:bidi". Maybe you should add that via code too. I think my base.docx file had that so I didn't need to add it via code. Also *maybe* you need ascii and hAnsi too in your rFonts. Like my example. Also please add python-docx tag to your question so relevant people could find this post. – Arash Rohani Mar 20 '19 at 16:07
  • Thanks, but what should the "w.val" look like? What would I need to do to insert it? I don't see anything in your example to guide me. – user1636349 Mar 20 '19 at 16:49
  • It is in my example (read it again). Mine was this: w:val="en-Us". Add it like I did w:bidi. `lang.set(qn('w:val'),'en-Us')` – Arash Rohani Mar 20 '19 at 17:17
  • Ah, OK, found it. I've now got but none of the changes made any difference -- the para still shows up as LTR. – user1636349 Mar 20 '19 at 17:29
  • does `rpr.get_or_add_sz()` still make an unreadable file? – Arash Rohani Mar 20 '19 at 17:33
  • my base.docx "normal" paragraph style: ` ` – Arash Rohani Mar 20 '19 at 17:53
  • Throw it into an online XML beautifier so you could read it better. I can at least see 1 thing different and that is: "" inside tag. – Arash Rohani Mar 20 '19 at 17:56
  • When it didn't work, I started with "document = Document(); document.add_paragraph(); document.save()" and added one line at a time before the add_paragraph(), trying to open the result each time. I got as far as "rpr.get_or_add_sz()" before it broke. – user1636349 Mar 20 '19 at 18:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/190385/discussion-between-roar-and-user1636349). – Arash Rohani Mar 20 '19 at 18:15
  • First let me say that adding the w:bidi to w:pPr did the trick! Thank you! (I have another issue, though...) – user1636349 Mar 20 '19 at 18:35
  • Did you see my last post in the chat? – Arash Rohani Mar 22 '19 at 04:21
  • Yes, thanks. Everything now works properly! – user1636349 Mar 23 '19 at 13:27

1 Answers1

2

As per the comments above, and with much help from ROAR (thanks, ROAR!) I got everything working.

ROAR's recipe here worked perfectly except that calling rpr.get_or_add_sz() gave me an unreadable .docx file. Leaving it out made everything work and didn't appear to cause any problems. The crucial missing link was to add the following to <w:pPr> in the style:

<w:bidi w:val="1">
<w:jc w:val="both"/>

There is a my_style.get_or_add_pPr() method to get a reference to the <w:pPr> section of the style, and the code is then similar to the code for updating <w:rPr>:

w_nsmap = '{'+ppr.nsmap['w']+'}'
bidi = None
jc = None
for element in ppr:
  if element.tag == w_nsmap + 'bidi':
    bidi = element
  if element.tag == w_nsmap + 'jc':
    jc = element
if bidi is None:
  bidi = OxmlElement('w:bidi')
if jc is None:
  jc = OxmlElement('w:jc')
bidi.set(qn('w:val'),'1')
jc.set(qn('w:val'),'both')
ppr.append(bidi)
ppr.append(jc)

The final thing I needed was to deal with mixed-language text, which I did by breaking the text into multiple runs. The paras of Hebrew text I was dealing with were given the modified style with rtl=True, but I split out any ASCII sequences which started and ended with a letter:

[A-Za-z][\u0020-\u007e]*[A-Aa-z]

into separate runs with rtl=False.

user1636349
  • 458
  • 1
  • 4
  • 21