0

I'm trying to process some Bibtex entries converted to an XML tree via Pybtex. I'd like to go ahead and process all the special characters from their LaTeX specials to unicode characters, via latexcodec. Via question Does pybtex support accent/special characters in .bib file? and the documentation I have checked the syntax, however, I am not getting the correct output.

>>> import latexcodec
>>> name = 'Br\"{u}derle'
>>> name.decode('latex')
u'Br"{u}derle'

I have tested this across different strings and special characters and always it just strips off the first slash without translating the character. Should I be using latexencoder differently to get the correct output?

Community
  • 1
  • 1
Sean K.
  • 45
  • 6

1 Answers1

1

Your backslash is not included in the string at all because it is treated as a string escape, so the codec never sees it:

>>> print 'Br\"{u}derle'
Br"{u}derle

Use a raw string:

name = r'Br\"{u}derle'

Alternatively, try reading actual data from a file, in which case the raw/non-raw distinction will not matter. (The distinction only applies to literal strings in Python source code.)

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • Thanks. Not sure why I missed that. I'm actually pulling it from a sqlite db but I wanted to test the process first in IDLE. (Will accept and edit ASAP) – Sean K. Dec 20 '13 at 20:14