0

I'm training the following model:

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=30, output_dim=64, mask_zero=True),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=1024)),
    tf.keras.layers.Dense(128, activation="sigmoid"),
    tf.keras.layers.Dense(10, activation="linear")
])

This network deals with text, so I turned each string of my dataset into a numpy array by converting each letter to a numeric value:

def converter(fen):
    normal_list = []

    for letter in fen:
        if letter == "/" or letter == " " or letter == "-":
            normal_list.append(0)
        elif letter == "p":
            normal_list.append(1)
        elif letter == "P":
            normal_list.append(2)
        elif letter == "n":
            normal_list.append(3)
        elif letter == "N":
            normal_list.append(4)
        elif letter == "b":
            normal_list.append(5)
        elif letter == "B":
            normal_list.append(6)
        elif letter == "r":
            normal_list.append(7)
        elif letter == "R":
            normal_list.append(8)
        elif letter == "q":
            normal_list.append(9)
        elif letter == "Q":
            normal_list.append(10)
        elif letter == "k":
            normal_list.append(11)
        elif letter == "K":
            normal_list.append(12)
        elif letter == "a":
            normal_list.append(13)
        elif letter == "b":
            normal_list.append(14)
        elif letter == "c":
            normal_list.append(15)
        elif letter == "d":
            normal_list.append(16)
        elif letter == "e":
            normal_list.append(17)
        elif letter == "f":
            normal_list.append(18)
        elif letter == "g":
            normal_list.append(19)
        elif letter == "h":
            normal_list.append(20)
        elif letter == "1":
            normal_list.append(21)
        elif letter == "2":
            normal_list.append(22)
        elif letter == "3":
            normal_list.append(23)
        elif letter == "4":
            normal_list.append(24)
        elif letter == "5":       
            normal_list.append(25) 
        elif letter == "6":
            normal_list.append(26)
        elif letter == "7":
            normal_list.append(27)
        elif letter == "8":
            normal_list.append(28)
        elif letter == "9":
            normal_list.append(29)
        else:
            normal_list.append(0)
    
    return np.array(normal_list, ndmin=2).astype(np.float32)
    # I used ndmin = 2 because the embedding layer turns it into ndmin = 3

Then I imported the dataset for training converting the samples:

x_set = []
y_set = []

for position in df["position"]:
    x_set.append(cvt.converter(position))

The len(x_set) is 950, and the x_set[0].shape is (1, ?) where ? varies between 50 and 70.

About the y_set, I used:

for a in range(len(df["position"])):
    y_set.append(np.array([
        df["Pawns"][a], df["Knights"][a], df["Bishops"][a], df["Rooks"][a],
        df["Queens"][a], df["Mobility"][a], df["King"][a], df["Threats"][a],
        df["Passed"][a], df["Space"][a]
    ], ndmin=2)) # If I don't use ndmin = 2 here I get ValueError: Data cardinality is ambiguous

And its len is also 950

When I call model.fit(x_set, y_set, epochs = 10) the model only uses one sample to train the net:

Epoch 1/10
1/1 [==============================] - 19s 19s/step - loss: 0.2291 - mae: 0.4116
Epoch 2/10
1/1 [==============================] - 3s 3s/step - loss: 0.1645 - mae: 0.3302
Epoch 3/10
1/1 [==============================] - 3s 3s/step - loss: 0.0764 - mae: 0.1982
Epoch 4/10
1/1 [==============================] - 3s 3s/step - loss: 1.4347 - mae: 1.0087
Epoch 5/10
1/1 [==============================] - 3s 3s/step - loss: 0.0038 - mae: 0.0461
Epoch 6/10
1/1 [==============================] - 3s 3s/step - loss: 0.0532 - mae: 0.1780
Epoch 7/10
1/1 [==============================] - 3s 3s/step - loss: 0.0597 - mae: 0.1931
Epoch 8/10
1/1 [==============================] - 3s 3s/step - loss: 0.0522 - mae: 0.1814
Epoch 9/10
1/1 [==============================] - 3s 3s/step - loss: 0.0375 - mae: 0.1583
Epoch 10/10
1/1 [==============================] - 3s 3s/step - loss: 0.0252 - mae: 0.1432

Shouldn't it be using all of 950 samples of x_set? What is wrong in this code?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Hugo Souza
  • 72
  • 8
  • mode is getting trained on full data set i.e 950 over 10 times if there is a need then batch size can be utilized. i don't think it using only one sample, where you noticed this? – simpleApp Apr 21 '21 at 17:17
  • `1` here means 1 batch, not 1 sample; see [Keras not training on entire dataset](https://stackoverflow.com/questions/61122276/keras-not-training-on-entire-dataset) – desertnaut Apr 21 '21 at 17:54

1 Answers1

0

This line indicates it's training on one batch, not one sample:

1/1 [==============================] - 19s 19s/step - loss: 0.2291 - mae: 0.4116

The default batch_size in Keras is 32, I believe. A Keras embedding layer expects integers, not floats, and you're using too many dimensions, so you should change this line in your converter:

return np.array(normal_list, ndmin=2).astype(np.float32)

To this:

return np.array(normal_list)

You want each training sample to have a shape of (?), where ? is 50-70 in your case. You want each target to have a shape of (10), because your model is outputting 10 values from its last dense layer. Combined with the number of samples, you want x_set to have a shape of (950, ?) and y_set to have a shape of (950, 10). To avoid issues, you should probably pad all of your samples to have the same size instead of varying between 50 and 70.

Your model expects this input:

>>> model.input_shape
(None, None)

Your model.summary() is as follows (the first None dimension is the batch_size, which in your case is 950):

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_2 (Embedding)      (None, None, 64)          1920      
_________________________________________________________________
bidirectional (Bidirectional (None, 2048)              8921088   
_________________________________________________________________
dense (Dense)                (None, 128)               262272    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 9,186,570
Trainable params: 9,186,570
Non-trainable params: 0
_________________________________________________________________

In short, you're embedding your entire training set into a single sample, I believe.

mmiron
  • 164
  • 5