0

I have to reshape a ndarray of [17205, 21] as [17011, 96, 100, 21] by applying two sliding windows to it.

In: arr
Out: [[ 8.  0.  0. -0.  0.  0.  8.  8.  0.  0.  0.  0.  8.  7.  6.  9.  9.  1.
   1.  1.  2.]
 [ 8.  0.  0. -0.  0.  0.  8.  8.  0.  0.  0.  0.  8.  7.  5.  9.  8.  2.
   1.  1.  2.]
.
.
.
 [ 8.  0.  0. -0.  0.  0.  8.  8.  0.  0.  0.  0.  8.  7.  5.  9.  8.  3.
   1.  1.  2.]]

My solution was to apply sliding windows to it two times. Then I apply the following method two times:

def separate_multi(sequences, n_steps):
    X = list()
    for i in range(len(sequences)):
       # find the end of this pattern
       end_ix = i + n_steps
       # check if we are beyond the dataset
       if end_ix > len(sequences):
           break
            # gather input and output parts of the pattern
       seq_x = sequences[i:end_ix, :]           
       X.append(seq_x)
       return np.array(X)

Giving the shape of [17106, 100, 21] and then once again with n_step=96, giving the shape of [17011, 96, 100, 21].

DRAWBACK: It stores the whole data in the memory which gives an error:

MemoryError: Unable to allocate 24.3 GiB for an array with shape (17011, 96, 100, 20) and data type float64 

A possible solution:

import tensorflow as tf
df = tf.data.Dataset.from_tensor_slices(df)
df = df.window(100, shift=1, stride=1, drop_remainder=True)
df = df.window(96, shift=1, stride=1, drop_remainder=True)

However, it doesn't give me the desired output since "it produces a dataset of nested windows", as it is said here.

Any idea? Thanks

Marlon Teixeira
  • 334
  • 1
  • 14

1 Answers1

0

I find the solution to my question:

The main problem wasn't to reshape the data in two steps, but actually the size of the objects I was forming by reshaping the data. Therefore, the solution was to break down the input array into pieces. For that aim I've design the following function:

def split_chunks(sequence, chunk=3000):
    list_seq = []
    for i in range(len(sequence)):
        if (i+1)*chunk > len(sequence):
            seq = sequence[i*chunk:-1, :]
            list_seq.append(seq)
            break
        else:
            seq = sequence[i*chunk:(i+1)*chunk, :]
            list_seq.append(seq)
    return list_seq

And then reshape each array inside the list_seq. Another option is the NumPy method np.split(), however, my function is 9 times faster than this one.

Marlon Teixeira
  • 334
  • 1
  • 14