I create word files using the python-docx library. I want to be able to set different parts of the document to different languages. How can the language be set with python-docx? Preferrably, I would like to do it at the run-level, since I need different languages on the same line (it's a dual language document I am creating). However, there does not seem to by any language attribute for runs, nor for paragraphs.
Asked
Active
Viewed 2,696 times
10
-
1There is a core property in document level: https://python-docx.readthedocs.io/en/latest/dev/analysis/features/coreprops.html?highlight=lang – alpert May 01 '16 at 15:57
-
I don't get this to work. As I understand, the core properties are document wide, and a property called "Language" does exists (under Properties/Custom), but setting it using e.g. ' core_properties.language = "Spanish"' does not change it in the document, secondly, it is probably not the right language property. There is another language property, which is available under Tools/Language. This second language property can be different in different parts of the text and is the one which determines spelling. This is the one I am after. – NiklasR May 02 '16 at 12:07
-
1On python-docx version 0.8.10 there is a property called [language](https://python-docx.readthedocs.io/en/latest/dev/analysis/features/coreprops.html?highlight=lang#properties) Yet no information on how to use it. – Traxidus Wolf May 01 '20 at 00:14
1 Answers
14
I think the language has to be set via document styles in word/styles.xml
or on run level.
But currently there is no API support for this task in python-docx.
Referring to this answer,
you can try the following code to alter the properties in the oxml element objects.
p4
shows the run level attempt.
(Tested with python-docx==0.8.10 + LibreOffice Writer with German and English language dictionaries.)
Note: The language field in the Core Document Properties is just a meta data information and is not used for global spell checking.
import docx # python-docx==0.8.10
doc = docx.Document()
# For new document (document-wide):
# Set language value in the documents' default Run's Properties element.
styles_element = doc.styles.element
rpr_default = styles_element.xpath('./w:docDefaults/w:rPrDefault/w:rPr')[0]
lang_default = rpr_default.xpath('w:lang')[0]
lang_default.set(docx.oxml.shared.qn('w:val'),'de-DE')
title = doc.add_paragraph('Rechtschreibprüfung', style='Title')
p1 = doc.add_paragraph(
'Das ist ein deutscher Satz. '
'Die Rechtschreibprüfung sollte nichts anstreichen.',
style='Normal'
)
# For existing styles:
# For styles without a language value
# you can append one explicitly by
# iterating over those styles in the document.
for my_style in doc.styles:
style = doc.styles[my_style.name]
rpr = style.element.get_or_add_rPr()
lang = docx.oxml.shared.OxmlElement('w:lang')
if not rpr.xpath('w:lang'):
lang.set(docx.oxml.shared.qn('w:val'),'de-DE')
lang.set(docx.oxml.shared.qn('w:eastAsia'),'en-US')
lang.set(docx.oxml.shared.qn('w:bidi'),'ar-SA')
rpr.append(lang)
p2 = doc.add_paragraph(
'This sentence is written in English. '
'The automatic spell checking should complain, '
'because all styles’ language was set to German.',
style='Quote'
)
# For addressing specifc styles:
# Update (or append to) a specific style,
# e.g. in order to use multiple styles
# to handle more than one language per document.
body_style = doc.styles['Body Text']
body_rpr = body_style.element.get_or_add_rPr()
body_lang = body_rpr.xpath('w:lang')[0]
body_lang.set(docx.oxml.shared.qn('w:val'),'en-US')
p3 = doc.add_paragraph(
'This sentence is written again in English. '
'The automatic spell checking should not complain, '
'because this style’s language now has been set to English.',
style='Body Text'
)
# Run Level:
# For mixing multiple languages
# within the same style per paragraph.
p4 = doc.add_paragraph(style='Body Text')
p4_text = p4.add_run()
p4_text.add_text(
'On Run Level: This sentence is written once again in English. '
'Spell check = OK | '
)
# Add a new run with its language
# differing from the style's language value.
p4_text = p4.add_run()
p4_rpr = p4_text.element.get_or_add_rPr()
p4_run_lang = docx.oxml.shared.OxmlElement('w:lang')
p4_run_lang.set(docx.oxml.shared.qn('w:val'),'de-DE')
p4_run_lang.set(docx.oxml.shared.qn('w:eastAsia'),'en-US')
p4_run_lang.set(docx.oxml.shared.qn('w:bidi'),'ar-SA')
p4_rpr.append(p4_run_lang)
p4_text.add_text(
'Und das ist noch einmal ein deutscher Satz. '
'Rechtschreibprüfung = okay'
)
doc.save('my-document.docx')

winkelband
- 166
- 1
- 5
-
2This answer solves both the document wide language setting and also a per run setting. Excellent! – Martin Evans Apr 13 '22 at 09:14
-
2I opened an issue on Github back in 2019: [Setting the document language has no effect](https://github.com/python-openxml/python-docx/issues/727). No response so far. This answer from winkelband solves the problem. +1 from my side! – Ali Apr 24 '22 at 14:43