Google colab is not training under full dataset

Question

I am facing a problem in training of a neural network in Google colab. My model is not training under the complete training dataset even after I have uploaded it in the drive and provide a proper path. Here is the code I have written

import tensorflow as tf
import tensorflow.keras as keras
from keras.models import Sequential
from keras.layers import Dense, Flatten, Activation, Dropout
from keras.optimizers import Adam
from sklearn.metrics import mean_squared_error, mean_absolute_error, max_error, r2_score
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

X=pd.read_csv('/content/drive/My Drive/ML Data/prob_232_full.dat',sep="\s+",header=None)
y=pd.read_csv('/content/drive/My Drive/ML Data/pGuess_232_full.dat',sep="\s+",header=None)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X.astype(np.float64), y.astype(np.float64), test_size = 0.25, random_state = 1)

X_train = np.array(X_train)
X_test = np.array(X_test)

# Sklearn wants the labels as one-dimensional vectors
y_train = np.array(y_train).reshape((-1,))
y_test = np.array(y_test).reshape((-1,))

ncols=X_train.shape[1]

model = Sequential()

model.add(Dense(activation="relu", input_dim=ncols, units=64, kernel_initializer="uniform"))
model.add(Dense(activation="relu", units=128, kernel_initializer="uniform"))
model.add(Dense(activation="relu", units=256, kernel_initializer="uniform"))
model.add(Dense(activation="relu", units=64, kernel_initializer="uniform"))
model.add(Dense(activation="relu", units=1, kernel_initializer="uniform"))

opt=keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer = opt, loss='mean_squared_error', metrics=['mean_absolute_error'])
history=model.fit(X_train, y_train, validation_data=(X_test, y_test), 
                  batch_size = 32, epochs = 40, verbose=1)

While the size of the training set is 457500, but it shows the model is training only under 14297 training data.

score 2 · Answer 1 · answered Aug 05 '20 at 05:40

2

Welcome to Stackoverflow.com

Dear your dataset is 457500 and you are using a batch size of 32 (in model.fit). So your total iterations for the dataset are 457500 / 32 almost equal to = 14296. The last batch contains 4 fewer examples so it is not using the last batch. So it is showing fine. It's just about understanding.

answered Aug 05 '20 at 05:40

Imran

775
7
19

1

+1 Had the same 'issue'/everything is actually fine with Google Colab. The video I was following used Jupyter notebook on a local computer, neither of us specified a batch_size, his showed 50,000 mine showed 1563. 1563*32 = 50,016. Nice to know it's all working fine though. – Levitybot Dec 06 '20 at 20:50

Google colab is not training under full dataset

1 Answers1