0

When I train insightface with train_parallel.py scripts, which is the model parallelism implementation of MXNet, I change the para of config.num_workers, and get this error:

Traceback (most recent call last):
  File "train_parall.py", line 434, in <module>
    main()
  File "train_parall.py", line 430, in main
    train_net(args)
  File "train_parall.py", line 282, in train_net
    from parall_module_dist import ParallModule
ModuleNotFoundError: No module named 'parall_module_dist'

I think it is some modules that I missed, but I installed the whole MXNET, may I have to install anything else?

Snow_Sun
  • 1
  • 1

1 Answers1

0

Ok, I have done with this on my own, it means python cannot find parall_module_dist.py as an import file, mxnet insightface model parallelism do not support the multi-node implementation, so if you want to modify the worker_num(something like rank in Pytorch), it will drive in the dead-end street.

Snow_Sun
  • 1
  • 1