1

Context: I am solving a PDE using CNN. And I need gradient of P w.r.t X.

P=[p1 p2 ; p3 p4 ], X= [x1 x2; x3 x4]

I need gradient of P w.r.t X. But Tensorflow gives me the sum of all the partials.

Stackoverflow doesn't allow me to post in latex format. So here's the same question I could post on Math.Stachexchange. Please click on this link for better clarity:

https://math.stackexchange.com/questions/3408736/finding-gradients-in-tensorflow-without-sum

What do I need to do to get the gradients as I need? I'd really appreciate your help. Searching for similar questions here didn't help.

1 Answers1

0

If I understand correctly, you need the Jacobian matrix. See https://www.tensorflow.org/api_docs/python/tf/test/compute_gradient and https://stackoverflow.com/a/48880275/3712254.

bantmen
  • 748
  • 6
  • 17
  • Thanks. But I don't need the Jacobian. If you click on that link, you'll see it. I actually need (partial p_i)/(partial x_i). – laplacian18 Oct 26 '19 at 19:32
  • 1
    Can't you take the diagonal of the Jacobian, then reshape it from (16) to (4, 4)? – bantmen Oct 26 '19 at 19:37
  • That's a good idea. In my case, P is 2x2, X is 2x2. So doing Z=Jacobian(P,X) in tensorflow returns a [2,2,2,2] tensor, that has (partial p_i)/(partial x_j). So doing diagonal operation now doesn't quite help. I need to extract the needed (partial p_i)/(partial x_i), by means of slicing Z, which is getting super-slow, because this process has to be done many times during the training. If you have a better way, I'd really appreciate!! – laplacian18 Oct 26 '19 at 23:44
  • That actually shouldn’t be too slow because you do the slice once at graph construction and not continuously at runtime - have you ran it and found it to be slow? – bantmen Oct 27 '19 at 12:48
  • Yea, you are right. I observed that the first evaluation takes long. After that, it's not bad. I have a follow-up question though. What I actually need are the sums of all partials (partial p_i)/(partial x_i) . Now, if I start with **sum=0**. And keep doing **sum=sum+partial** inside the for-loop, will that create a new sum-increment Op for each loop? – laplacian18 Oct 29 '19 at 19:28
  • Yep, `sum`, at the end of that for loop, will be the addition of all of the partials because if `sum` and `partial` are tensors, then `+` should resolve to a `tf.add` operation. Alternatively, see if you can use https://www.tensorflow.org/api_docs/python/tf/math/reduce_sum. – bantmen Oct 29 '19 at 19:29
  • 1) Ok, after the for-loop ends, I understand that the `sum` will have the sums of all my required partials. What I meant is: does the graph create a new Op each time the `sum=sum+partial` line is executed in the for-loop. And will it simply fill up my memory? – laplacian18 Oct 29 '19 at 21:45
  • 2) Reg the `reduce_sum`, I am sceptical if I can use it. Because I don't need the entire sum or neither the sum reduced along certain axis. If J is my Jacobian, what I need is J[0,0,0,0]+J[0,1,0,1] +J[1,0,1,0]+J[1,1,1,1]. So perhaps `reduce_sum` is not useful for such selective-summing. – laplacian18 Oct 29 '19 at 21:45
  • 1) It will create separate ops but that might not fill up your memory. What is your for-loop over? Feel free to paste some code here. – bantmen Oct 29 '19 at 22:05
  • `sum=0 for i in np.arange(2): for j in np.arange(2): d1=tf.gradients(P[i,j], X)[0] #outputs a [2,2] tensor d2=tf.gradients(d1[i,j], X)[0] #finds the second derivative #also a [2,2] tensor sum.append(tf.slice(d2,[i,j],[1,1])) #Finally sum contains all second derivatives (\partial^2 p_i)/(\partial (x_i)^2 ) ` – laplacian18 Oct 31 '19 at 15:58
  • Sorry. I am not sure how to format the pasted code. (Here it is with proper indents: http://txt.do/1k2ch). So in the code, I am actually computing the laplacian of P w.r.t X. So basically I am creating a list *sum* of only the needed partials (\partial^2 p_i)/(\partial (x_i)^2. I will finally sum them later when needed. Hope it is clear what I'm evaluating. – laplacian18 Oct 31 '19 at 16:07