how to extract a unicode string with boost.python

Question

It seems that the code will crash when I do extract<const char*>("a unicode string")

Anyone know how to solve this?

Don't have a definitive answer, but [here](http://mail.python.org/pipermail/cplusplus-sig/2009-July/014720.html) and [here](http://mail.python.org/pipermail/cplusplus-sig/2009-July/014664.html) I found some references that might be of interest to you — mac, Jul 08 '11 at 10:57
Is this the error you are getting? "TypeError: No registered converter was able to extract a C++ pointer to type char from this Python object of type unicode." Could you give example code and/or give a idea of what you are trying to do? — James Hurford, Jul 11 '11 at 02:50
Can you clarify the question? It is really not clear. What is the argument you give to extract? Is it a literal string? A boost::python::object? — eudoxos, Jul 12 '11 at 14:33
You are supposed to accept the correct answer to your questions. — o0'., Sep 18 '11 at 14:46

André Anjos · Answer 1 · 2013-02-24T05:56:23.373

This compiles and works for me, with your example string and using Python 2.x:

void process_unicode(boost::python::object u) {
  using namespace boost::python;
  const char* value = extract<const char*>(str(u).encode("utf-8"));
  std::cout << "The string value is '"<< value << "'" << std::endl;
}

You can write a specific from-python converter, if you wish to auto-convert PyUnicode (@Python2.x) to const wchar_t* or to a type from ICU (that seems to be the common recommendation for dealing with Unicode on C++).

If you want full support to unicode characters which are not in the ASCII range (for example, accented characters such as á, ç or ï, you will need to write the from-python converter. Note this will have to be done separately for Python 2.x and 3.x, if you wish to support both. For Python 3.x, the PyUnicode type was deprecated and now the string type works as PyUnicode used to for Python 2.x. ~~Nothing that a couple of #if PY_VERSION_HEX >= 0x03000000 cannot handle~~.

[edit]

The above comment was wrong. Note that, since Python 3.x treats unicode strings as normal strings, boost::python will wrap that into boost::python::str objects. I have not verified how those are handled w.r.t. unicode translation in this case.

score 1 · Answer 2 · answered Aug 23 '11 at 08:02

1

Have you tried

extract<std::string>("a unicode string").c_str()

or

extract<wchar_t*>(...)

answered Aug 23 '11 at 08:02

edvaldig

2,301
16
17

how to extract a unicode string with boost.python

2 Answers2