0

Have anybody tried extracting a char* from a Python 3 PyObject* having type name str? Normal strings in Python 3 have type name str in the C API.

For Python 2 one can use PyString_Check() and PyString_AsStringAndSize present in the header stringobject.h.

For Python 3 this header is not present, instead there is bytesobject and unicodeobject.h. For the latter, using D, I put together

private const(char)[] toChars(PyObject* value) {
    import deimos.python.unicodeobject : PyUnicode_Check;
    if (PyUnicode_Check!()(value)) {
        Py_ssize_t size;
        const char* s = PyUnicode_AsUTF8AndSize(value, &size);
        return s[0 .. size];
    }
    ...
}
// https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize
const(char)* PyUnicode_AsUTF8AndSize(PyObject* unicode, Py_ssize_t* size);

but that doesn't match a value being passed a standard Python 3 string (literal) of type str created via str(...). Moreover, I couldn't come up with a way to construct a "unicode string" in Python 3 that matches PyUnicode_Check in the C API. I'm utterly confused. I could try converting it to a bytes object and use the functions in bytesobject.h but that doesn't seem right either?

I've also tried

PyBytes_AsStringAndSize(PyObject* obj, char** s, Py_ssize_t* len);

but that fails complaining about obj not being of type bytes.

Nordlöw
  • 11,838
  • 10
  • 52
  • 99

1 Answers1

0

You can use PyArg_ParseTuple() with the "s" format.

For example in 'C',

const char * toChars(PyObject * _value)
{
  const char *value;
  if (!PyArg_ParseTuple(_value, "s", &value))
      return NULL; /* fails! */
  return value;
}

Assuming you know the object will live for the lifetime of the use of string, or you may need to bump reference counts.

I think this works with all versions of Python and I have used it with 3.8.10.

PyUnicode_FromString() will work if you need to supply a new string to Python.

artless noise
  • 21,212
  • 6
  • 68
  • 105
  • I believe there is all sort of locale and other compositional issues (a PyObject is multiple objects, etc). I tried `PyUnicode_AsUTF8AndSize()` and gave up. – artless noise Sep 14 '22 at 21:29