0

I am running tornado app as given below

app = make_app()
server = tornado.httpserver.HTTPServer(app)
server.bind(8888)
server.start(0)  # autodetect number of cores and fork a process for each
print("server started at port 8888")
tornado.ioloop.IOLoop.instance().start()

this successfully starts app on available cores . and this is the piece of code which is running on an api call

  ctx = mx.cpu(0)
 _, arg_params, aux_params = mx.model.load_checkpoint(args.prefix, args.epoch)
arg_params, aux_params = ch_dev(arg_params, aux_params, ctx)
sym = resnet_50(num_class=2)
arg_params["data"] = mx.nd.array(img, ctx)
arg_params["im_info"] = mx.nd.array(im_info, ctx)
exe = sym.bind(ctx, arg_params, args_grad=None, grad_req="null", aux_states=aux_params)
print("detect 4")
tic = time.time()
print("detect 5")
exe.forward(is_train=False)
print("detect 6")
output_dict = {name: nd for name, nd in zip(sym.list_outputs(), exe.outputs)}
rois = output_dict['rpn_rois_output'].asnumpy()[:, 1:] 

when running tornado app on single core it works fine, but on multi core this runs till the last line of above code, after that i am getting this error

Segmentation fault: 11

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.so                         0x0000000116ef741f _ . 
ZN5mxnet15segfault_loggerEi + 63
[bt] (1) 1   libsystem_platform.dylib            0x00007fff6ce4af5a _sigtramp 
+ 26
[bt] (2) 2   libsystem_malloc.dylib              0x00007fff6cd73cc0 
malloc_zone_calloc + 87
[bt] (3) 3   CarbonCore                          0x00007fff46798117 
_ZL22connectToCoreServicesDv + 258
[bt] (4) 4   CarbonCore                          0x00007fff46797fe4 
_ZL9getStatusv + 24
[bt] (5) 5   CarbonCore                          0x00007fff46797f62 
scCreateSystemServiceVersion + 49
[bt] (6) 6   CarbonCore                          0x00007fff46799392 
FileIDTreeGetCachedPort + 213
[bt] (7) 7   CarbonCore                          0x00007fff467991f2 
FSNodeStorageGetAndLockCurrentUniverse + 79
[bt] (8) 8   CarbonCore                          0x00007fff46799080 
FileIDTreeGetAndLockVolumeEntryForDeviceID + 38
[bt] (9) 9   CarbonCore                          0x00007fff46798fdd 
_ZN7FSMountC2Ej17FSMountNumberTypePiPKj + 75
child 3 (pid 42579) exited with status 255, restarting
chetan dev
  • 611
  • 2
  • 6
  • 16

1 Answers1

0

I encountered similar problem, when I was using mxnet with multiprocessing and OpenCV. I didn't use Tornado, but symptoms were same: a single process environment worked fine, but as soon as I set multiprocessing, I get segmentation faults.

It turns out that my problem was related to this issue: https://github.com/opencv/opencv/issues/5150, and I fixed that by setting cv2.setNumThread(0) in the beginning of my code. Since you are using resnet, I assume that you also have a dependency on OpenCV.

I also notice that there were quite a few Segmentation fault issues fixed in mxnet version 1.1, so if you are not using this version, I recommend to upgrade to it as it is much more stable.

Sergei
  • 1,617
  • 15
  • 31