2

I am trying to write a function in Cython to processs a lits of strings. In the code below, I am trying to convert a list of unicode str objects (in Python 3) to a table of char*, which then is used in order to search for substrings.

I found a solution for Python 2 here but this solution is dependent to object PyString_AsString which is available only in Python 2, whereas in Python 3 we are supposed to use PyUnicode_AsUTF8, which I found out about here. When I tried to use PyUnicode_AsUTF8, I run into this error:

:31:16: 'PyUnicode_AsUTF8' is not a constant, variable or function identifier

I am pretty much out of ideas. Whatever I try causes some sort of error.

The code

import cython
from cpython.mem cimport PyMem_Malloc, PyMem_Realloc, PyMem_Free
from cpython.string cimport PyUnicode_AsUTF8
from libc.string cimport strstr

@cython.boundscheck(False)
def start(itsstr, tokens):
    cdef size_t s
    cdef size_t t
    cdef size_t ns = len(itsstr)
    cdef size_t nt = len(tokens)
    cdef const char** t_str = _char_table(itsstr, ns)
    cdef const char** t_tok = _char_table(tokens, nt)
    cdef unicode x
    for s in xrange(ns):
        for t in xrange(nt):
            if strstr(t_str[s], t_tok[t]):
                x = itsstr[s]
    PyMem_Free(t_str)
    PyMem_Free(t_tok)

cdef const char** _char_table(s, const size_t n):
    cdef char** t = <char**>PyMem_Malloc(n * sizeof(char*))
    cdef size_t i = 0
    for i in xrange(n):
        temp = PyUnicode_AsUTF8(s[i])
        t[i] = temp
    return t
Celdor
  • 2,437
  • 2
  • 23
  • 44
  • Does this answer your question? [How to call a CPython's C-API function which doesn't exist in Cython's cpython-headers?](https://stackoverflow.com/questions/67594979/how-to-call-a-cpythons-c-api-function-which-doesnt-exist-in-cythons-cpython-h) – ead May 19 '21 at 08:26

1 Answers1

4

Cython doesn't wrap the function PyUnicode_AsUTF8 in cpython.string. So you have to do it yourself:

#instead of from cpython.string cimport PyUnicode_AsUTF8
cdef extern from "Python.h":
    const char* PyUnicode_AsUTF8(object unicode)

Actually, in versions prior to Python 3.7 it was char * PyUnicode_AsUTF8(...), but having const in front of it doesn't disturb Cython also for older versions.

ead
  • 32,758
  • 6
  • 90
  • 153
  • Thanks. I don't know how to start with this but I'll try to find something! – Celdor Jun 10 '18 at 22:01
  • 1
    @Celdor: You just have to replace the `cimport` with the code from the answer, there is noting more to it... – ead Jun 11 '18 at 06:33
  • I see, I thought I'd have to implement it by myself. Thanks :) I am going to test it as soon as I can – Celdor Jun 11 '18 at 09:17