I'm new to using CNNs but I'm trying to make one using the functional API with the CIFAR10 dataset. The only thing is I'm getting very very low accuracy. I've looked over my textbook examples and documentation but can't figure out why it's so low when it should be starting way higher. This is my setup using DenseNet201 and tf version 2.7:
#load in data
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
X_train, X_test = X_train / 255.0, X_test / 255.0
# one hot encode target values/labels
y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)
# have to preprocess before using functional API
X_testP = tf.keras.applications.densenet.preprocess_input(X_test)
X_trainP = tf.keras.applications.densenet.preprocess_input(X_train)
# data size we start with
inputs = tf.keras.Input(shape=(32,32,3))
# densenet expects 224x224 so use lambda layer
resized_images = tf.keras.layers.Lambda(lambda image: tf.image.resize(image, (224, 224)))(inputs)
# initialize model
transfer = keras.applications.DenseNet201(include_top=False, weights='imagenet', pooling='max', input_tensor = resized_images,input_shape=(224,224,3), classes=1000)
# add your layers
x = transfer.output
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(256, activation='relu')(x)
x = tf.keras.layers.BatchNormalization() (x)
x = tf.keras.layers.Dense(200, activation='relu')(x)
x = tf.keras.layers.Dense(128, activation='relu')(x)
x = tf.keras.layers.Dense(64, activation='relu')(x)
output = tf.keras.layers.Dense(10, activation='softmax')(x)
transfer_model = keras.Model(inputs=transfer.input, outputs=output)
transfer_model.trainable = False;
# here I try SGD but I also tried Adam to no better results
optimizer = keras.optimizers.SGD(learning_rate=0.2, momentum=0.9, decay=0.01)
transfer_model.compile(optimizer=optimizer,loss='categorical_crossentropy',metrics=['accuracy'])
history_transfer = transfer_model.fit(X_trainP, y_train,epochs=20)
I feel like all the examples I've seen start way higher and that's even without additional layers. Am I misunderstanding something in the initialization?