Convergence in Logistic Regression in distributed tensorflow

Question

I'm trying to develop logistic regression in distributed tensorflow and I want to integrate a convergence check in my algorithm apart from the upper bound of iterations. The convergence criteria I am about to use is

||prevW - currW|| < E

where prevW is the previous values of the model weights and currW the current ones. E is the convergence tolerance.

My question is about the previous model weights. Since I am using between graph replication and asynchronous training, I don't know when it's worker of the cluster will update the weights. So let's say a worker has computed the new weights using a batch and wants to check if the algorithm has converged in order to stop. I will use the weights available in local replica (so use the corresponding tensor) or I will evaluate the tensor to get the last updated value before I continue with the current computation? I tried to do as described above, but the algorithm did not converge and stopped after the upper bound for the iterations was reached.

Thank you beforehand for your help :D

score 0 · Answer 1 · answered May 03 '17 at 22:04

0

I would do the convergence check in the same device where the variables are. This way you avoid copying too much stuff over the network. This can be done by putting it in a with tf.device(variable.device): block.

answered May 03 '17 at 22:04

Alexandre Passos

5,186
1
14
19

Thank you for your response! :) In my case the variables are stored in the parameter server. Even if i place the corresponding convergence variable onto a worker, as previous weights shall I consider as last variables the worker trying to check for convergence computed or the ones last stored in the ps which might have changed by another worker, since the training is asynchronous? – nikosprov May 04 '17 at 14:47
I think that's an algorithm question which should be decided experimentally (i.e. try both and see which one is more robust). Convergence tests with stochastic gradient can be tricky because seeing a gradient which leads to no movement doesn't mean you will never see a gradient which leads to movement. – Alexandre Passos May 04 '17 at 16:49

Convergence in Logistic Regression in distributed tensorflow

1 Answers1