Questions tagged [sgd]

63 questions
1
vote
0 answers

How to understand a periodicity in the training loss using a pre-trained model of PyTorch?

I'm using a pre-trained model from Pytorch ( Resnet 18,34,50) in order to classify images. During the training, a weird periodicity appears in the training as you can see in the image below. Did somebody already have a similar issue?In order to deal…
1
vote
0 answers

How to implement momentum and decay correctly - SGD

I am trying to apply momentum and decay to a mini-batch SGD: What would be the right way to update my weights, I get weird results as soon as decay is set.. import numpy as np def _mini_batch(self,X,y,batch_size): # sack data for shuffle -…
Bennimi
  • 416
  • 5
  • 14
1
vote
1 answer

using ModuleList, still getting ValueError: optimizer got an empty parameter list

With Pytorch I am attempting to use ModuleList to ensure model parameters are detected, and can be optimized. When calling the SGD optimizer I get the following error: ValueError: optimizer got an empty parameter list Can you please review the…
racycle
  • 13
  • 3
1
vote
1 answer

NAN values with SGD optimizer in Keras for regression NN

I try to train a NN for regression. When using SGD optimizer class from Keras I suddently get NAN values as prediction from my network after the first step. Before I was running trainings with the Adam optimizer class and everything worked fine. I…
Perschi
  • 11
  • 2
1
vote
1 answer

How to manipulate client gradients in tensorflow federated sgd

I'm following this tutorial to get started with tensorflow federated. My aim is to run federated sgd (not federated avg) with some manipulations on client gradient values before they are sent to the server. Before moving forward, to briefly…
Saam
  • 385
  • 1
  • 3
  • 12
1
vote
1 answer

Why does `partial_fit` in `SGDClassifier` suffer from gradual reduction in model accuracy

I am training an online-leaning SVM Classifier using SGDClassifier in sklearn. I learnt that it is possible using partial_fit. My model definition is : model = SGDClassifier(loss="hinge", penalty="l2", alpha=0.0001, max_iter=3000, tol=1e-3,…
Pe Dro
  • 2,651
  • 3
  • 24
  • 44
1
vote
0 answers

Can not set Momentum of optimizer

I am using the SGD optimizer and want to set the momentum after initialization similar to learning rate scheduling by using tf.keras.backend.set_value(optimizer.momentum, momentumValue):…
1
vote
2 answers

Decrease the maximum learning rate after every restart

I'm training a neural network for a computer vision-based task. For the optimizer, I found out that it isn't ideal to use a single learning rate for the entire training, and what people do is that they use learning rate schedulers to decay the…
cronin
  • 83
  • 1
  • 1
  • 6
1
vote
1 answer

NaN by Matrix Factorization

I implemented the matrix factorization using the SGD algorithm but I get frequently the NaN in the predicted matrix when I run it. When I run the algorithm on a very tiny (6 x 7) matrix, the number of times that the error appears is small. As I have…
Infinity
  • 307
  • 6
  • 16
1
vote
1 answer

SGD optimiser graph

I just wanted to ask a quick question. I understand that val_loss and train_loss is insufficient to tell if the model is overfitting. However, i wish to use it as a rough gauge by monitoring if the val_loss is increasing. As i use SGD optimiser, i…
0
votes
0 answers

Implementing sklearn SGD with l2 regularization from scratch

I need some help or an idea about what's going wrong in my code. I am trying to implement the SGD regressor using l2, but the bias in my model reaches very high value (when alpha is above 10). I think something is wrong with gradient and its…
Vadim
  • 1
  • 2
0
votes
0 answers

Ways to access the gradient during an SGD in Vowpal Wabbit

I am attempting to alter the implementation of the Stochastic Gradient Descent (SGD) algorithm in Vowpal Wabbit for logistic regression purposes. I aim to access the computed gradient at each step and perform extra operations before applying the SGD…
0
votes
0 answers

Optimization Graph training average loss per epoch and testing average loss per epoch

Testing and training function def CNN_training(image, label, layers, alpha=2.0): # Forward step output, loss, accuracy = CNN_forward(image, label, layers) # Initial gradient gradient = np.zeros(10) gradient[label] = -1/output[label] # Backprop…
Elfs
  • 1
  • 3
0
votes
0 answers

How to formulate the SGD (Stochastic Gradient Descent) method to solve for the maximum "energy" in a dynamical system?

What I face: I would like to maximize the "energy" of variable x governed by a dynamical system equation dx/dt=g(x) at time T. The "energy" of x is defined as E(t)=x^2/2, since x is a function of time. x could be a vector or a scalar. The…
jengmge
  • 77
  • 1
  • 5
0
votes
0 answers

SGD Optimizer Not Working on cost function

I wanted to make own neural network for Speech data set and for that using tensorflow.I am writing the code imported library and dataset then done one hot encoding and after all done the weights and baises assignment and then done the forward…
Christopher Marlowe
  • 2,098
  • 6
  • 38
  • 68