2

I am reading a book about Deep Learning and I am currently learning about Keras functional API in it. In the context:

"The input layer takes a shape argument that is a tuple that indicates the dimensionality of the input data. When input data is one-dimensional, such as for a Multilayer Perceptron, the shape must explicitly leave room for the shape of the minibatch size used when splitting the data when training the network. Therefore, the shape tuple is always defined with a hanging last dimension (2,), this is the way you must define a one-dimensional tuple in Python, for example:"

I did not quite understand the shape part - why is the second parameter left empty? What does keeping it empty mean? None means that it could take any shape but what is happening here? Also, about the mini batch size - isn't only one data processed at a time in a NN and with mini batch - we update the learning rate (if using sgd) after every batch of data gets evaluated with our model. Then why do we need to change the dimension of our input shape to accommodate this? - shouldn't only one data instance go at a time?

Richi Dubey
  • 89
  • 12

1 Answers1

2

If your data was two-dimensional (e.g. a greyscale image) then the numpy array would be of shape (height, width), for example. With a one-dimensional input though, you might be tempted to say its shape is just length. When you say (length,) instead, the difference is that you have not an integer, but a tuple with one element.

The idea about batches is that multiple are processed at once to speed training up. How exactly that works internally, I am not sure of, but oftentimes you actually have more than one instance in a batch. I believe the gradient descent simply does not update the weights between each instance and instead it is only updated after each batch - which means that every instance in a batch can be computed in parallel.

My guess why they point out that the shape being a tuple is relevant is that there is no special case handling for when the shape is just an integer. E.g. you can loop over a tuple's entries, but not over an integer.

Notice also that the shape of a numpy array is also a tuple:

>>> import numpy as np
>>> np.array([1,2,3]).shape
(3,)

so you can use array.shape directly if you wish to do so.

Technically, you could use batches but set the batch size to 1. That can be confusing though, because if you use squeeze somewhere, it will get rid of the batch dimension as well.

lucidbrot
  • 5,378
  • 3
  • 39
  • 68
  • Hi, Thanks for the answer. What are you told kinda makes sense, but if you see the following code (where the data is two dimensional): inputs = keras.Input(shape = (28,28)) this is for an MNIST dataset, and doing model.summary shows that this inputs layers gives an output of shape (None, 28, 28) which takes care of batch processing I believe - since None means that any number of images with shape (28,28) can be passed. So, analogous to this - why is it (2,) and not (None,2) in my original question (for integer data)? I hope I am making sense. – Richi Dubey Feb 12 '22 at 05:19
  • I'm just guessing now, because I don't use keras often, but I believe using `None` to denote this is not a convention that is common. It's just for displaying to you. They could of course do as you say and interpret a `(None,2,)` the same as a `(2,)` but it seems they chose not to. Maybe to avoid complex parsing in their code. The batch dimension is simply later added as a first dimension, and the other dimensions remain. Tbh I too find the documentation you quoted in your Q confusing. But I _believe_ they just wanted to say "the shape must be a tuple, even if it's only 1 dimension" – lucidbrot Feb 12 '22 at 11:10
  • 1
    Yes - that is what is happening. when (2,) is being passed - the input shape is regarded as (None,2) (as I can see in the summary). Maybe it has something to do with Tuple vs Integer. I will look more into this. Again, thanks a lot for your help! Appreciate it. – Richi Dubey Feb 13 '22 at 12:32