Memory Consumption of Jacobian Dataset Augmentation

Question

I am trying to replicate results from https://arxiv.org/abs/1602.02697, but using images size 224x224x3 following the black-box tutorial https://github.com/tensorflow/cleverhans/blob/master/cleverhans_tutorials/mnist_blackbox.py

However, I am hitting a memory-consumption error (pasted below). It seems to me that jacobian dataset augmentation could be the source issue: https://github.com/tensorflow/cleverhans/blob/master/cleverhans/utils_tf.py#L657

Yet, I don't know how to check that.

I am running the code on 8GB GPU.

Could it be that this method can't work on bigger images? How can I fix this? What's the complexity of the method?

...
2019-02-07 18:21:32.984709: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 7.31GiB
2019-02-07 18:21:32.984715: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats:
Limit:                  7860224000
InUse:                  7848987648
MaxInUse:               7848987648
NumAllocs:                10041921
MaxAllocSize:           2424832000

2019-02-07 18:21:32.984831: W tensorflow/core/common_runtime/bfc_allocator.cc:271] ****************************************************************************************************
2019-02-07 18:21:32.984849: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[4,256,56,56] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

score 0 · Answer 1 · answered Feb 15 '19 at 22:51

0

This is most likely explained by the fact that the size of the X_batch doubles at each iteration p_idxs. If you replace L698-703 by a call to batch_eval provided in CleverHans, you will most likely be able to compute this even on ImageNet. If this solves your problem, feel free to submit as a PR to CleverHans on GitHub.

answered Feb 15 '19 at 22:51

Nicolas Papernot

351
1
2

Thank you for your answer. I tried to replace L698-703 with the call: `grad_val = batch_eval(sess, [x], [tf.sign(grads)], [X_batch], batch_size=1)[0]`, but when I tested it against MNIST blackbox tutorial, I get an assertion error on output tensor: `... line 481, in batch_eval assert e.shape[0] == cur_batch_size, e.shape AssertionError: (10, 1, 28, 28, 1)` – Martin Matak Feb 16 '19 at 16:43
Looks like you will have to squeeze out the dimension 1 (the second) before passing the data batch to the batch_eval method. – Nicolas Papernot Feb 18 '19 at 06:32
That `1` represents `batch_size` so I would assume it should stay. 28x28x1 represent WxHxC and the first dimension represents how many grads need to be computed (for every class, 1 grad is computed), i.e. `10` in this case. I believe this is what leads to a problem with age estimation task: computing grads for 101 class and 224x224 images require > 8gb RAM on GPU. That said, I think the solution is not that easy as replacing the sess.run() with batch.eval(), i.e. computing it in batches. Even if it would be, I am not sure how to pass it to batch_eval then. What do you think about this? – Martin Matak Feb 18 '19 at 22:05
Now I reduced number of classes (from 100) to 4 and kept 224x224x3 images. Started augmentation of 1000 samples, but process got killed by kernel (due to resource exhaustion). – Martin Matak Feb 19 '19 at 18:21

Memory Consumption of Jacobian Dataset Augmentation

1 Answers1