20

Passing a numpy array of dtype np.float64_t works fine ( below), but I can't pass string arrays.

This is what works :

# cython_testing.pyx
import numpy as np
cimport numpy as np

ctypedef np.float64_t dtype_t 

cdef func1 (np.ndarray[dtype_t, ndim=2] A):
    print A 

def testing():
    chunk = np.array ( [[94.,3.],[44.,4.]], dtype=np.float64)

    func1 (chunk)

But I can't make this work: I can't find the matching 'type identifiers' for numpy string dtypes.

# cython_testing.pyx
import numpy as np
cimport numpy as np

ctypedef np.string_t dtype_str_t 

cdef func1 (np.ndarray[dtype_str_t, ndim=2] A):
    print A 

def testing():
    chunk = np.array ( [['huh','yea'],['swell','ray']], dtype=np.string_)

    func1 (chunk)

The compilation error is :

Error compiling Cython file:
------------------------------------------------------------
ctypedef np.string_t dtype_str_t 
    ^
------------------------------------------------------------

cython_testing.pyx:9:9: 'string_t' is not a type identifier

UPDATE

Per looking through numpy.pxd, I see the following ctypedef statements. Maybe that's enough to say I can use uint8_t and pretend everything is normal, as long as I can do some casting?

ctypedef unsigned char      npy_uint8
ctypedef npy_uint8      uint8_t

Just have to see how expensive that casting will be.

HeyWatchThis
  • 21,241
  • 6
  • 33
  • 41

2 Answers2

9

With Cython 0.20.1 it works using cdef np.ndarray, without specifying the data type and the number of dimensions:

import numpy as np
cimport numpy as np

cdef func1(np.ndarray A):
    print A

def testing():
    chunk = np.array([['huh','yea'], ['swell','ray']])
    func1(chunk)
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • @TedPetrou I am trying to build an example where the `dtype=object` would accelerate in order to update the answer, but up to now I found it to be equivalent to not specifying `dtype`. How did you measure the 100x speed up? – Saullo G. P. Castro Feb 14 '18 at 08:24
  • Looks like I massively misspoke in my previous comment. It looks like I am getting 5x improvement by changing to object. Use this array. `a = np.array(['some', 'strings', 'in', 'an', 'array'] * 10 ** 5)` – Ted Petrou Feb 14 '18 at 15:42
7

Looks like you're out of luck.

http://cython.readthedocs.org/en/latest/src/tutorial/numpy.html

Some data types are not yet supported, like boolean arrays and string arrays.


This answer is no longer valid as shown by Saullo Castro's answer, but I'll leave it for historical purposes.

JAB
  • 20,783
  • 6
  • 71
  • 80
  • Thanks. I upvoted your answer. Though I hope there is a work around by using perhaps the Numpy Structured array [http://docs.scipy.org/doc/numpy/user/basics.rec.html#structured-arrays]. But I am still looking for how to pass one of those too. – HeyWatchThis Jun 12 '12 at 21:00
  • 1
    At least for my purposes, using cProfile, it looks like you can still pass Numpy arrays w/o typing, in Cython. But you do not get the Cython optimizations described in your readthedocs.org reference. – HeyWatchThis Jun 12 '12 at 22:47
  • 1
    Being able to use them slowly is still better than not being able to use them at all, though, right? – JAB Jun 13 '12 at 13:10
  • Content of this link has been modified. The quote doesn't exist. – gzc Feb 24 '17 at 12:32