11

I have 20 channel data each with 5000 values (total of 150,000+ records stored as .npy files on the HD).

I am following the keras fit_generator tutorial available on https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly.html to read the data (each record is read as (5000, 20) numpy array of type float32.

The networks that I have theorized, have parallel convolutional networks for each channel which concatenate at the end to and thus need to be feed data in parallel. Reading and feeding only single channel from the data and feeding to a single network is successful

def __data_generation(self, list_IDs_temp):
    'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels)
    # Initialization
    if(self.n_channels == 1):
        X = np.empty((self.batch_size, *self.dim))
    else:
        X = np.empty((self.batch_size, *self.dim, self.n_channels))
    y = np.empty((self.batch_size), dtype=int)

    # Generate data
    for i, ID in enumerate(list_IDs_temp):
        # Store sample
        d = np.load(self.data_path + ID + '.npy')
        d = d[:, self.required_channel]
        d = np.expand_dims(d, 2)
        X[i,] = d

        # Store class
        y[i] = self.labels[ID]

    return X, keras.utils.to_categorical(y, num_classes=self.n_classes)

However when reading the whole record and trying to feed it to the network with slicing using Lambda layers I get the

Reading the whole record

 X[i,] = np.load(self.data_path + ID + '.npy')

Using the Lambda Slicing Layer implementation available at : https://github.com/keras-team/keras/issues/890 and calling

input = Input(shape=(5000, 20))
slicedInput = crop(2, 0, 1)(input)

I am able to compile the model and it show the expected layer sizes.

When the data is fed to this network, I get

ValueError: could not broadcast input array from shape (5000,20) into shape (5000,1)

Any help would be much appreciated....

MuTaTeD
  • 861
  • 2
  • 8
  • 13

1 Answers1

21

As mentioned in the Github thread you are referencing, Lambda layer can return only one output, and thus the proposed crop(dimension, start, end) returns only a single "Tensor on a given dimension from start to end".

I believe what you want to achieve could be done in such a way:

from keras.layers import Dense, Concatenate, Input, Lambda
from keras.models import Model

num_channels = 20
input = Input(shape=(5000, num_channels))

branch_outputs = []
for i in range(num_channels):
    # Slicing the ith channel:
    out = Lambda(lambda x: x[:, i])(input)

    # Setting up your per-channel layers (replace with actual sub-models):
    out = Dense(16)(out)
    branch_outputs.append(out)

# Concatenating together the per-channel results:
out = Concatenate()(branch_outputs)

# Adding some further layers (replace or remove with your architecture):
out = Dense(10)(out)

# Building model:
model = Model(inputs=input, outputs=out)    
model.compile(optimizer=keras.optimizers.Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

# --------------
# Generating dummy data:
import numpy as np
data = np.random.random((64, 5000, num_channels))
targets = np.random.randint(2, size=(64, 10))

# Training the model:
model.fit(data, targets, epochs=2, batch_size=32)
# Epoch 1/2
# 32/64 [==============>...............] - ETA: 1s - loss: 37.1219 - acc: 0.1562
# 64/64 [==============================] - 2s 27ms/step - loss: 38.4801 - acc: 0.1875
# Epoch 2/2
# 32/64 [==============>...............] - ETA: 0s - loss: 38.9541 - acc: 0.0938
# 64/64 [==============================] - 0s 4ms/step - loss: 36.0179 - acc: 0.1875
benjaminplanche
  • 14,689
  • 5
  • 57
  • 69
  • You can create a custom layer instead of using Lambdas and return multiple outputs as list. – nuric Jun 05 '18 at 14:30
  • Right. The loop is also here to set up "the parallel convolutional networks for each channel" OP mentions. – benjaminplanche Jun 05 '18 at 14:35
  • The code till model.summary() runs perfectly and I get the model layer/parameter information but when I compile the model to start the training like ---> model.compile(optimizer=optimizers.Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy']) I get the same error ---> ValueError: could not broadcast input array from shape (5000,20) into shape (5000,1) – MuTaTeD Jun 05 '18 at 17:22
  • How's your model input defined? It should be `input = Input(shape=(5000, 20))`. As you can see from my updated answer, things are compiling and training smoothly... – benjaminplanche Jun 05 '18 at 18:08
  • In order not to cancel a dimesion you can turn `out = Lambda(lambda x: x[:, i])(input)` to `out = Lambda(lambda x: x[:, i:i+1])(input)` – Nir Oct 19 '21 at 15:24