1

I have a translatable string in one of my Jinja2 templates:

Project can’t end sooner than it starts

(Note the UTF-8 apostrophe in “can’t”.)

When I extract messages and update my translation files, both the template (.pot) and translation (.po) files have the following msgid:

msgid "Project canât end sooner than it starts"

It seems Babel “translated” the UTF-8 characters as if they were in some kind of 8-bit character set.

My babel.cfg is a really short one:

[python: **.py]
[jinja2: **/templates/**.html]
extensions=jinja2.ext.autoescape,jinja2.ext.with_,webassets.ext.jinja2.AssetsExtension

Is there a way for Babel to notice the template is already in UTF-8, and not to transalete it from whatever charset it thinks? I can’t see any related option in the help output of pybabel extract --help nor pybabel extract --help.

I use Python3 exclusively, for the record.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
GergelyPolonkai
  • 6,230
  • 6
  • 35
  • 69

1 Answers1

2

Turns out it is supported out of the box, it’s just seems undocumented. All I had to do is changing the configuration:

[python: **.py]
[jinja2: **/templates/**.html]
encoding=utf-8
extensions=jinja2.ext.autoescape,jinja2.ext.with_,webassets.ext.jinja2.AssetsExtension

The encoding=utf-8 part did its magic, all HTML files are now treated as UTF-8 data.

GergelyPolonkai
  • 6,230
  • 6
  • 35
  • 69