3

Currently, there are a lot of deep learning models developed in Caffe instead of tensorflow. If I want to re-write these models in tensorflow, how to start? I am not familiar with Caffe structure. It seems to me that there are some files storing the model architecture only. My guess is that I only need to understand and transfer those architecture design into Tensorflow. The input/output/training will be re-written anyway. Is this thought meaningful?

I see some Caffe implementation also need to hack into the original Caffe framework down to the C++ level, and make some modifications. I am not sure under what kind of scenario the Caffe model developer need to go that deep? If I just want to re-implement their models in Tensorflow, do I need to go to check their C++ modifications, which are sometimes not documented at all.

I know there are some Caffe-Tensorflow transformation tool. But there are always some constraints, and I think re-write the model directly maybe more straightforward.

Any thougts, suggestions, and link to tutorials are highly appreciated.

user288609
  • 12,465
  • 26
  • 85
  • 127
  • I don't know Caffe, but I ported some Torch models to TF without knowing any torch or lua to start with, and the general strategy has been to start with smallest possible network, make results match for a single forward pass and then keep adding things to both TF and Torch sides while checking for equality, so maybe you can do similar approach for Caffe – Yaroslav Bulatov Nov 02 '16 at 17:54
  • 1
    Usually Caffe model developer needs go to the C++ level to add some new operation to the Caffe. While particular operation may already be in tensorflow you can't be sure that it does the same. So in general you need to understand what that custom code does. – dm0_ Nov 02 '16 at 20:47

1 Answers1

3

I have already asked a similar question.

To synthetise the possible answers :

  1. You can either use pre-existing tools like etheron's kaffe(which is really simple to use). But its simplicity comes at a cost: it is not easy to debug.

  2. As @Yaroslav Bulatov answered start from scratch and try to make each layer match. In this regard I would advise you to look at ry's github which is a remarkable example where you basically have small helper functions which indicate how to reshape the weights appropriately from caffe to Tensorflow, which is the only real thing you have to do to make simple models match and also provides activations check layer by layer.

jeandut
  • 2,471
  • 4
  • 29
  • 56
  • Thank you Jean. Regarding the activations check, what does that mean? Would you like to explain it more? – user288609 Nov 04 '16 at 02:37
  • The code from ry is pretty much explanatory but the principle is you choose some input you pass it through each layer one at a time and you check if the norm of the difference between the activations you get from this input through your caffe layer and the activations you get from the tensorflow layer is inferior to a certain threshold. – jeandut Nov 04 '16 at 08:25
  • ry's function is: def same_tensor(a, b): return np.linalg.norm(a - b) < 0.1 No idea why he chose 0.1 as threshold I guess he had good reasons. – jeandut Nov 04 '16 at 08:26
  • Of course it requires to have built pycaffe to access the numpy array representing caffe activations in python. – jeandut Nov 04 '16 at 08:28
  • But the activation check is just a safety net. Once you get how caffe operates convolutions and matrix multiplications compared to tensorflow it should produce the right activations. In my mind the only really tricky part in doing this is when you flatten your tensor to go fully connected where you really have to think it through so that the following matrix multiplication in caffe would produce the same results as in TF but in ry's code it is perfectly explained. – jeandut Nov 04 '16 at 08:32
  • That code works for each simple layer, but if you dig deeper into ry's repo you find a code to convert caffe's resnets to tensorflow with conversion protocol for batch normalization layer and "saute-moutons" connections. I guess if you want to convert more complex networks you'll have to do the conversions yourslef but the method remains the same. – jeandut Nov 04 '16 at 08:37