0

I am using this for loop to separate dataset into groups. but the list "y" is converting into an array with an error.

def to_sequences(dataset, seq_size=1):
    x = []
    y = []

    for i in range(len(dataset)-seq_size):
       
        window = dataset[i:(i+seq_size), 0]
        x.append(window)
        window2 = dataset[(i+seq_size):i+seq_size+5, 0]
        y.append(window2)
        
    return np.array(x),np.array(y)

seq_size = 5 
trainX, trainY = to_sequences(train, seq_size)
print("Shape of training set: {}".format(trainX.shape))
print("Shape of training set: {}".format(trainY.shape))

And this is the error message I get

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. return np.array(x),np.array(y)

Couldn't find the issue why it is working for 'x' and not for 'y'. Any idea ?

aaossa
  • 3,763
  • 2
  • 21
  • 34
jkmp
  • 55
  • 2
  • 7
  • Why you mention that " it is working for 'x' and not for 'y'."? It seems to me that `y` should be the problem. Did you try the suggested solution by adding `dtype=object` to the `np.array(x)` declaration? – aaossa Feb 14 '22 at 19:49
  • Does this answer your question? [Debugging Numpy VisibleDeprecationWarning (ndarray from ragged nested sequences)](https://stackoverflow.com/questions/63097829/debugging-numpy-visibledeprecationwarning-ndarray-from-ragged-nested-sequences) – aaossa Feb 14 '22 at 19:49
  • It gives the expected output for X like this --- array([[1.6417541e-04, 1.8490013e-04, 5.3410418e-05, 8.7562017e-05, 7.6301396e-05], [1.8490013e-04, 5.3410418e-05, 8.7562017e-05, 7.6301396e-05, 9.8595303e-04], – jkmp Feb 14 '22 at 20:02
  • but for y it gives like this even after converting the type ---array([array([[0.00098595], [0.00388295], [0.00851235], [0.01531321], [0.01527738]], dtype=float32), array([[0.00388295], [0.00851235], [0.01531321], [0.01527738], [0.02505753]], dtype=float32), – jkmp Feb 14 '22 at 20:03
  • I think you should add those to your question. That's the relevant part, I initially thought that your problem was the warning, but your problem is actually the output. – aaossa Feb 14 '22 at 20:04
  • What's the shape of `window2`? – aaossa Feb 14 '22 at 20:09

1 Answers1

1
In [247]: dataset = np.arange(20)
In [248]: def to_sequences(dataset, seq_size=1):
     ...:     x = []
     ...:     y = []
     ...:     for i in range(len(dataset)-seq_size):
     ...:         window = dataset[i:(i+seq_size), 0]
     ...:         x.append(window)
     ...:         window2 = dataset[(i+seq_size):i+seq_size+5, 0]
     ...:         y.append(window2)
     ...:     return np.array(x),np.array(y)
     ...: 

and a sample run:

In [250]: to_sequences(dataset[:,None], 5)
<ipython-input-248-176eb762993c>:9: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return np.array(x),np.array(y)
Out[250]: 
(array([[ 0,  1,  2,  3,  4],
        [ 1,  2,  3,  4,  5],
        [ 2,  3,  4,  5,  6],
        [ 3,  4,  5,  6,  7],
        [ 4,  5,  6,  7,  8],
        [ 5,  6,  7,  8,  9],
        [ 6,  7,  8,  9, 10],
        [ 7,  8,  9, 10, 11],
        [ 8,  9, 10, 11, 12],
        [ 9, 10, 11, 12, 13],
        [10, 11, 12, 13, 14],
        [11, 12, 13, 14, 15],
        [12, 13, 14, 15, 16],
        [13, 14, 15, 16, 17],
        [14, 15, 16, 17, 18]]),
 array([array([5, 6, 7, 8, 9]), array([ 6,  7,  8,  9, 10]),
        array([ 7,  8,  9, 10, 11]), array([ 8,  9, 10, 11, 12]),
        array([ 9, 10, 11, 12, 13]), array([10, 11, 12, 13, 14]),
        array([11, 12, 13, 14, 15]), array([12, 13, 14, 15, 16]),
        array([13, 14, 15, 16, 17]), array([14, 15, 16, 17, 18]),
        array([15, 16, 17, 18, 19]), array([16, 17, 18, 19]),
        array([17, 18, 19]), array([18, 19]), array([19])], dtype=object))

The first array is (n,5) int dtype. The second is object dtype, containing arrays. Most of the arrays (5,), but the last ones are (4,),(3,),(2,),(1,).

dataset[(i+seq_size):i+seq_size+5, 0] is slicing off the end of dataset. Python/numpy allows that but the result is truncated.

You'll have to rethink that y slicing if you want a (n,5) shaped array.

Slicing off the end of a list:

In [252]: [1,2,3,4,5][1:4]
Out[252]: [2, 3, 4]
In [253]: [1,2,3,4,5][3:6]
Out[253]: [4, 5]
hpaulj
  • 221,503
  • 14
  • 230
  • 353