I am using a cluster (similar to slurm but using condor) and I wanted to run my code using VS code (its debugger specially) and it's remote sync extension.
I tried running it using my debugger in VS code but it didn't quite work as expected.
First I logged in to the cluster using VS code and remote sync as usual and that works just fine. Then I go ahead an get an interactive job with the command:
condor_submit -i request_cpus=4 request_gpus=1
then that successfully gives a node/gpu to use.
Once I have that I try to run the debugger but somehow it logs me out from the remote session (and it looks like it goes to the head node from the print statements). That's NOT what I want. I want to run my job in the interactive session in the node/gpu I was allocated. Why is VS code running it in the wrong place? How can I run it in the right place?
Some of the output from the integrated terminal:
source /home/miranda9/miniconda3/envs/automl-meta-learning/bin/activate
/home/miranda9/miniconda3/envs/automl-meta-learning/bin/python /home/miranda9/.vscode-server/extensions/ms-python.python-2020.2.60897-dev/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /home/miranda9/automl-meta-learning/automl/automl/meta_optimizers/differentiable_SGD.py
conda activate base
(automl-meta-learning) miranda9~/automl-meta-learning $ source /home/miranda9/miniconda3/envs/automl-meta-learning/bin/activate
(automl-meta-learning) miranda9~/automl-meta-learning $ /home/miranda9/miniconda3/envs/automl-meta-learning/bin/python /home/miranda9/.vscode-server/extensions/ms-python.python-2020.2.60897-dev/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /home/miranda9/automl-meta-learning/automl/automl/meta_optimizers/differentiable_SGD.py
--> main in differentiable SGD
hello world torch_utils!
vision-sched.cs.illinois.edu
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
-> initialization of DiMO done!
---> i = 0, iteration/it 1 about to start
lp_norms(mdl) = 18.43514633178711
lp_norms(meta_optimized mdl) = 18.43514633178711
[e=0,it=1], train_loss: 2.304989814758301, train error: -1, test loss: -1, test error: -1
---> i = 1, iteration/it 2 about to start
lp_norms(mdl) = 18.470401763916016
lp_norms(meta_optimized mdl) = 18.470401763916016
[e=0,it=2], train_loss: 2.3068909645080566, train error: -1, test loss: -1, test error: -1
---> i = 2, iteration/it 3 about to start
lp_norms(mdl) = 18.548133850097656
lp_norms(meta_optimized mdl) = 18.548133850097656
[e=0,it=3], train_loss: 2.3019633293151855, train error: -1, test loss: -1, test error: -1
---> i = 0, iteration/it 1 about to start
lp_norms(mdl) = 18.65604019165039
lp_norms(meta_optimized mdl) = 18.65604019165039
[e=1,it=1], train_loss: 2.308889150619507, train error: -1, test loss: -1, test error: -1
---> i = 1, iteration/it 2 about to start
lp_norms(mdl) = 18.441967010498047
lp_norms(meta_optimized mdl) = 18.441967010498047
[e=1,it=2], train_loss: 2.300947666168213, train error: -1, test loss: -1, test error: -1
---> i = 2, iteration/it 3 about to start
lp_norms(mdl) = 18.545459747314453
lp_norms(meta_optimized mdl) = 18.545459747314453
[e=1,it=3], train_loss: 2.30662202835083, train error: -1, test loss: -1, test error: -1
-> DiMO done training!
--> Done with Main
(automl-meta-learning) miranda9~/automl-meta-learning $ conda activate base
(automl-meta-learning) miranda9~/automl-meta-learning $ hostname vision-sched.cs.illinois.edu
Doesn't even run without debugging mode
The problem is more serious than I thought. I can't run the debugger in the interactive session but I can't even "Run Without Debugging" without it switching to the Python Debug Console on it's own. So that means I have to run things manually with python main.py
but that won't allow me to use the variable pane...which is a big loss!
What I am doing is switching my terminal to the conoder_ssh_to_job
and then clicking the button Run Without Debugging
(or ^F5
or Control + fn + f5
) and although I made sure to be on the interactive session at the bottom in my integrated window it goes by itself to the Python Debugger window/pane which is not connected to the interactive session I requested from my cluster...
related: