4

In the example below, we have a contiguous array, and a view of the same array that is non-contiguous:

shape = (5, 100)
A = np.arange(np.product(shape)).reshape(shape)

# Everything is contiguous at this point
assert A.flags.c_contiguous == True

# Since we're taking a view over the
# row minor dimension, we'll have 5
# segments of 50 contiguous elements
A[:,20:70].flags.c_contiguous == False

# Each segment is contiguous
A[0,20:70].flags.c_contiguous == True
A[1,20:70].flags.c_contiguous == True
A[2,20:70].flags.c_contiguous == True
A[3,20:70].flags.c_contiguous == True
A[4,20:70].flags.c_contiguous == True

The view references 5 segments of 50 contiguous elements.

If we look at a more general case

shape = (30, 20, 70, 50)
B = np.arange(np.product(shape)).reshape(shape)

then, the following holds:

B[0,:,:,:].flags.c_contiguous == True
B[0,0,:,:].flags.c_contiguous == True
B[0,0,0,:].flags.c_contiguous == True

A partial solution, based on _UpdateContiguousFlags in flagobjects.c might be:

def is_contiguous(ary):
    is_c_contig = True
    sd = ary.itemsize
    for i in reversed(range(ary.ndim)):
        dim = ary.shape[i]

        # Contiguous by default
        if dim == 0:
            return True

        if dim != 1:
            if ary.strides[i] != sd:
                is_c_contig = False

            sd *= dim

    return is_c_contig

But I don't think this handles cases where the array is transposed. e.g.:

B.transpose(3,0,1,2)

Question: Is there a method of identifying the contiguous segments in a general NumPy view/array?

I need this functionality in order to transfer data from a NumPy array to a GPU without copying data into pinned memory. Instead, I'd like to pin contiguous segments using cudaHostRegister and then do the transfer to the GPU. I'm aware of CUDA's support for pitched memory via MemCpy3D, but I'd like to handle more general cases.

Edit 1: Added a basic solution to the question, and asked about the transpose case.

Edit 2: Clarified removing the need to copy data into pinned memory.

talonmies
  • 70,661
  • 34
  • 192
  • 269
Simon
  • 553
  • 4
  • 14
  • 2
    If `A` is `c_contiguous`, then any simple slice on the last dimension will also be. `A[2,10:20]` is, `A[3,10:20:2]` is not, or is `A[1;5,3]`. Once you understand the data layout and slicing you can predict what wil be contiguous. – hpaulj Jul 28 '15 at 17:24
  • @hpaulj if you want to flesh that out into an answer I would upvote it. – Robert Crovella Aug 01 '15 at 02:42
  • *"I need this functionality in order to transfer data from a NumPy array to a GPU without creating copies"* - what makes you think that's possible? The array data is either in RAM or VRAM - there's no way to avoid creating a copy when transferring from one to the other. – ali_m Aug 02 '15 at 11:50
  • @ali_m I've made a second edit. Essentially, I'd like to pin contiguous segments of memory for transfer to the GPU, instead of making an extra copy of existing data into a pinned array and *then* doing the transfer. – Simon Aug 03 '15 at 10:41
  • Why don't you just try registering the memory with `pycuda.driver.register_host_memory`and catch the exception if the registration fails? Also note that the transpose case you are wondering about doesn't actual change the underlying storage in any way, it just signals to the API to use the alternative storage order to that which the array use during reads, so you can probably forget about it in this context – talonmies Aug 05 '15 at 17:18
  • 1
    Looks like the 1D and 2D cases are handled by [`_memcpy_discontig`](https://github.com/inducer/pycuda/blob/dae67bfae98d67a4346821dddde39aecf5f1a95b/pycuda/gpuarray.py#L1088) function recently introduced in [this pull request](https://github.com/inducer/pycuda/pull/76). – Simon Aug 07 '15 at 10:10

0 Answers0