AssertionError: batch_size must be divisible by the number of TPU cores in use (1 vs 8) when using the predict function

Question

Some details for context:

Working on Google Colab using TPU.
Model is fitting successfully without any issues
Running into issues while attempting to use the predict function

Here is the code I'm using to train:

tpu_model.fit(x, y,
          batch_size=128,
          epochs=60)

Here is the code I'm using to predict:

def generate_output():
    generated = ''
    #sentence = text[start_index: start_index + Tx]
    #sentence = '0'*Tx
    usr_input = input("Write the beginning of your poem, the Shakespeare machine will complete it. Your input is: ")
    # zero pad the sentence to Tx characters.
    sentence = ('{0:0>' + str(maxlen) + '}').format(usr_input).lower()
    generated += usr_input 

    sys.stdout.write("\n\nHere is your poem: \n\n") 
    sys.stdout.write(usr_input)
    for i in range(400):

        x_pred = np.zeros((1, maxlen, len(chars)))

        for t, char in enumerate(sentence):
            if char != '0':
                x_pred[0, t, char_indices[char]] = 1.

        --> preds = tpu_model.predict(x_pred, batch_size = 128 ,workers = 8,verbose=0)[0]
        next_index = sample(preds, temperature = 1.0)
        next_char = indices_char[next_index]

        generated += next_char
        sentence = sentence[1:] + next_char

        sys.stdout.write(next_char)
        sys.stdout.flush()

        if next_char == '\n':
            continue

And here is the error (Added an arrow above so you know the location of the error:

AssertionError: batch_size must be divisible by the number of TPU cores in use (1 vs 8)

This makes no sense to me as the batch size I used while training is divisible by 8 AND the batch size I've passes in my predict function is divisible by 8.

I'm not sure what the issue is and how to resolve it. Any help would be much appreciated.

score 0 · Answer 1 · answered Apr 26 '19 at 23:50

0

From the error:

AssertionError: batch_size must be divisible by the number of TPU cores in use (1 vs 8)

It looks like you are using a batch_size of 1, which can be inferred from the first dimension of your input data:

x_pred = np.zeros((1, maxlen, len(chars)))

I think you might want to change it to:

x_pred = np.zeros((8, maxlen, len(chars)))

so that the batch dimension becomes 8 that matches the number of TPU cores in use.

Or you can also keep the current batch_size of 1 but use 1 TPU core.

answered Apr 26 '19 at 23:50

greeness

15,956
5
50
80

Thanks! Do you know how to change the number of cores and is it possible for me to use 8 cores for training and 1 core for making predictions? – madsthaks Apr 29 '19 at 18:36
It's totally fine to use different cores for training vs serving.Your problem here is that you defined a `model_fn` that is NOT compatible with the cores you are using. Make sure your batch_size is a multiple of TPU_cores (in any MODE: train, eval, predict). To select a different TPU type, you should look at tpu documentation on google cloud, maybe https://cloud.google.com/tpu/docs/deciding-tpu-version#accelerator-type – greeness Apr 29 '19 at 20:28

AssertionError: batch_size must be divisible by the number of TPU cores in use (1 vs 8) when using the predict function

1 Answers1