Questions tagged [custom-training]

90 questions
1
vote
0 answers

Microbatching (accumulating gradients) in Tensorflow 2.x with tf.function

How can micro batching be implemented in tensorflow 2.x? That is I would like to accumulate gradients for several batches and then update the weights with these accumulated gradients (this would virtually increase my batch size to accumulation steps…
1
vote
2 answers

Creating models in a loop makes Keras increasingly slower

when I train a model multiple times, the training iterations slow down, even if all the relevant quantities are created inside a for loop (and should therefore be overwritten each time, which should be enough to avoid creating growing computation…
1
vote
0 answers

Tensorflow training slows down after each epoch

I use a Titan Xp GPU. the code is below but I can't figure out where the problem is. Why does the training time for each epoch increase constantly? Initially I can process around 180 batches per minute, but after three epochs I can only process 5…
1
vote
1 answer

Slow training on CPU and GPU in a small network (tensorflow)

Here is the original script I'm trying to run on both CPU and GPU, I'm expecting a much faster training on GPU however it's taking almost the same time. I made the following modification to main()(the first 4 lines) because the original script does…
1
vote
1 answer

simple inception block in pytorch taking much much longer time to train on GPU?

I am training very simple inception block followed by a maxpool and fully-connected layer on NVIDIA GeForce RTX 2070 GPU and its taking very long time for an iteration. Just finished 10 iterations in more than 24 hours. Here is the code for…
MSD Paul
  • 1,648
  • 3
  • 13
  • 31
1
vote
1 answer

How to plot history of training metrics in Sagemaker .py training

I am running a notebook in Sagemaker and I use a .py file for training: tf_estimator = TensorFlow(entry_point='train_cnn.py', role=role, train_instance_count=1, …
1
vote
0 answers

Neural network takes time to train even after freezing all layers

In tensorflow, after I set the trainable flag of each layer to False, attempting to train the network does not change the weights (as expected). However, each epoch still takes the same amount of time (about 12 seconds) to train, just like training…
1
vote
1 answer

Forward Pass calculation on current batch in "get_updates" method of Keras SGD Optimizer

I am trying to implement a stochastic armijo rule in the get_gradient method of Keras SGD optimizer. Therefore, I need to calculate another forward pass to check if the learning_rate chosen was good. I don't want another calculation of the…
1
vote
1 answer

Why is the 'filters' set as (classes + 5) * 3 in this article?

Here's a tutorial about doing custom training of YOLO (Darknet): https://medium.com/@manivannan_data/how-to-train-yolov3-to-detect-custom-objects-ccbcafeb13d2 The tutorial guides how to set values in the .cfg files: classes = Number of classes,…
Dee
  • 7,455
  • 6
  • 36
  • 70
0
votes
1 answer

Using GCSFuse vs NFS share for custom training on Vertex AI

We are currently using GCS Fuse with Google Cloud Storage during our training and are seeing very slow performance. The bug seems to be with Google and they are actively working on the Fuse Bug. I was wondering if someone has tried setting up an NFS…
0
votes
0 answers

Mask RCNN model for Image segmentation is stuck on First epoch

I am dealing with issue where my model is stuck on 1st epoch (look below). I am using this library which is the fork of the original Mask-RCNN library: https://github.com/alsombra/Mask_RCNN-TF2 Dataset that I am currently using has 27 images with…
0
votes
0 answers

(semi) supervised learning; The custom trainings loop doesn't train the model properly. The training seems to ignore any weights

I try to code a semi-supervised model for a project besides the university. First I got a model which trained with the model.fit (I tested the model with supervised learning first). But for semi-supervised learning, I need flexibility, therefore I…
0
votes
1 answer

customized training loop parametric optimization

I implemented a custom training loop for a custom loss function that also incorporates the constraints of an unsupervised parametric optimization problem. The corresponding training loop creates then multiple epoch_loss outputs (the amount of the…
0
votes
1 answer

Error in number of inputs when using CustomDataGenerator with Keras model

I am trying to create a Keras model which takes two separate pieces of information at different stages (an image first, then concatenates two coordinates at the point of fully-connected layers). When I run my code, I am stuck at an…
ROS
  • 315
  • 3
  • 9
0
votes
0 answers

"train" is not defined Pylance(reportUndefinedVariable) [Ln 124, Col 13]

I have defined my "train" via def function, although lines PREVIOUS my def seems to not have any issues. However line that mentions "train" last gives an error of undefined variable. Here is my code, this is not the full code and I am working in VS…