!python -m torch.distributed.launch --nproc_per_node=8 /root/examples/run_squad.py \
--model_type bert \
--model_name_or_path bert-large-uncased-whole-word-masking \
--do_train \
--do_eval \
--do_lower_case \
--train_file /root/DATA/train-v2.0.json \
--predict_file /root/DATA/dev-v2.0.json \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ../root/result/ \
--per_gpu_eval_batch_size=3 \
--per_gpu_train_batch_size=3 \
I'm using google colab and I want to training my A&Q dataset which downloaded from SQuad website. But when I run the code above it return me an error.
Can some one help me fix this problem?The full error msg as following and I'll appreciate any suggestions:
this is error msg: [THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected Traceback (most recent call last): File "/root/examples/run_squad.py", line 575, in main() File "/root/examples/run_squad.py", line 469, in main torch.cuda.set_device(args.local_rank) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 300, in set_device torch._C._cuda_setDevice(device) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 193, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50 Traceback (most recent call last): File "/root/examples/run_squad.py", line 575, in THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected main() THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected Traceback (most recent call last): File "/root/examples/run_squad.py", line 575, in main() THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected File "/root/examples/run_squad.py", line 469, in main Traceback (most recent call last): File "/root/examples/run_squad.py", line 469, in main torch.cuda.set_device(args.local_rank) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 300, in set_device torch.cuda.set_device(args.local_rank) torch._C._cuda_setDevice(device) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 193, in _lazy_init torch._C._cuda_init() Traceback (most recent call last): File "/root/examples/run_squad.py", line 575, in main() RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50 File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 300, in set_device torch._C._cuda_setDevice(device) File "/root/examples/run_squad.py", line 575, in File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 193, in _lazy_init File "/root/examples/run_squad.py", line 469, in main torch._C._cuda_init() main() torch.cuda.set_device(args.local_rank) RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50 File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 300, in set_device torch._C._cuda_setDevice(device) File "/root/examples/run_squad.py", line 469, in main torch.cuda.set_device(args.local_rank) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 193, in _lazy_init torch._C._cuda_init() File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 300, in set_device torch._C._cuda_setDevice(device) RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50 File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 193, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50 Traceback (most recent call last): File "/root/examples/run_squad.py", line 575, in main() File "/root/examples/run_squad.py", line 469, in main torch.cuda.set_device(args.local_rank) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 300, in set_device torch._C._cuda_setDevice(device) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 193, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50 THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected Traceback (most recent call last): File "/root/examples/run_squad.py", line 575, in main() File "/root/examples/run_squad.py", line 469, in main torch.cuda.set_device(args.local_rank) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 300, in set_device torch._C._cuda_setDevice(device) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 193, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50 THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected Traceback (most recent call last): File "/root/examples/run_squad.py", line 575, in main() File "/root/examples/run_squad.py", line 469, in main torch.cuda.set_device(args.local_rank) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 300, in set_device torch._C._cuda_setDevice(device) File "/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py", line 193, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50 Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 253, in main() File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 249, in main cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', '/root/examples/run_squad.py', '--local_rank=7', '--model_type', 'bert', '--model_name_or_path', 'bert-large-uncased-whole-word-masking', '--do_train', '--do_eval', '--do_lower_case', '--train_file', '/root/DATA/train-v2.0.json', '--predict_file', '/root/DATA/dev-v2.0.json', '--learning_rate', '3e-5', '--num_train_epochs', '2', '--max_seq_length', '384', '--doc_stride', '128', '--output_dir', '../root/result/', '--per_gpu_eval_batch_size=3', '--per_gpu_train_batch_size=3']' returned non-zero exit status 1.]