0

I'm trying to train cGAN on g4dn.xlarge GPU ec2 machine and it crashes every time after 8 epochs exactly with the following message:

Traceback (most recent call last):
  File "pix2pix_tf2.py", line 841, in <module>
    main()
  File "pix2pix_tf2.py", line 802, in main
    results = sess.run(fetches, options=options, run_metadata=run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 958, in run
    run_metadata_ptr)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1181, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: 2 root error(s) found.
  (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
  (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
     [[{{node TensorArrayV2Write/TensorListSetItem}}]]
  (1) Invalid argument: 2 root error(s) found.
  (0) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
  (1) Invalid argument: During Variant Host->Device Copy: non-DMA-copy attempted of tensor type: string
0 successful operations.
0 derived errors ignored.
     [[{{node TensorArrayV2Write/TensorListSetItem}}]]
     [[Func/encode_images/target_pngs/while/body/_47/input/_154/_773]]
0 successful operations.
0 derived errors ignored.

env spec: tensorflow 2.2.0 CUDA V10.0.130 cudnn 7.6.5

talonmies
  • 70,661
  • 34
  • 192
  • 269
user3424107
  • 117
  • 1
  • 5

1 Answers1

0

updating CUDA to 10.1 solved the issue

user3424107
  • 117
  • 1
  • 5