I am trying to write a function in Cython to processs a lits of strings. In the code below, I am trying to convert a list of unicode str
objects (in Python 3) to a table of char*
, which then is used in order to search for substrings.
I found a solution for Python 2 here but this solution is dependent to object PyString_AsString
which is available only in Python 2, whereas in Python 3 we are supposed to use PyUnicode_AsUTF8
, which I found out about here. When I tried to use PyUnicode_AsUTF8
, I run into this error:
:31:16: 'PyUnicode_AsUTF8' is not a constant, variable or function identifier
I am pretty much out of ideas. Whatever I try causes some sort of error.
The code
import cython
from cpython.mem cimport PyMem_Malloc, PyMem_Realloc, PyMem_Free
from cpython.string cimport PyUnicode_AsUTF8
from libc.string cimport strstr
@cython.boundscheck(False)
def start(itsstr, tokens):
cdef size_t s
cdef size_t t
cdef size_t ns = len(itsstr)
cdef size_t nt = len(tokens)
cdef const char** t_str = _char_table(itsstr, ns)
cdef const char** t_tok = _char_table(tokens, nt)
cdef unicode x
for s in xrange(ns):
for t in xrange(nt):
if strstr(t_str[s], t_tok[t]):
x = itsstr[s]
PyMem_Free(t_str)
PyMem_Free(t_tok)
cdef const char** _char_table(s, const size_t n):
cdef char** t = <char**>PyMem_Malloc(n * sizeof(char*))
cdef size_t i = 0
for i in xrange(n):
temp = PyUnicode_AsUTF8(s[i])
t[i] = temp
return t