0

I'm working on Accelio on top of softRoCE.

Ib devices configured -
# ibv_devices 
    device                 node GUID
    ------              ----------------
    rxe1                821f02fffef91598
    rxe0                d6bed9fffebe94af
error while running the accelio client -
# xio_ow_client 
 =============================================
 Server Address     : 127.0.0.1
 Server Port        : 2061
 Transport      : rdma
 Header Length      : 32
 Data Length        : 32
 Connection Index   : 0
 CPU Affinity       : 0
 Finite run     : 0
 =============================================
**** starting ...
session event: connection error. reason: No such device

# rping -c
rdma_resolve_route: No such device

Hence checked the opensm status - # /etc/init.d/opensmd status opensm is stopped # /etc/init.d/opensmd start opensm start [FAILED]

# tail -f /var/log/opensm.log 
Jul 09 15:04:45 655213 [AA4F3700] 0x03 -> OpenSM 3.3.7
Jul 09 15:04:45 692960 [AA4F3700] 0x80 -> OpenSM 3.3.7
Jul 09 15:04:45 693149 [AA4F3700] 0x02 -> osm_vendor_init: 1000 pending umads specified
Jul 09 15:04:45 797977 [AA4F3700] 0x80 -> Entering DISCOVERING state
Jul 09 15:04:45 799152 [AA4F3700] 0x02 -> osm_vendor_bind: Binding to port 0xd6bed9fffebe94af
Jul 09 15:04:45 800414 [AA4F3700] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Jul 09 15:04:45 800422 [AA4F3700] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Jul 09 15:04:45 800425 [AA4F3700] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Jul 09 15:04:45 800430 [AA4F3700] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Jul 09 15:04:45 829702 [AA4F3700] 0x80 -> Exiting SM

I'll appreciate some pointers so that I can understand where I am going wrong.

kguest
  • 3,804
  • 3
  • 29
  • 31
dhara
  • 1
  • 3

1 Answers1

0

OpenSM is not needed for RoCE devices. Therefore failing to start OpenSM when you only have RoCE devices is to be expected.

rping failed to run due to you not specifying a server to address to connect to. Assuming your machine's RoCE capable interfaces are at IPs 192.168.1.2 (server) and 192.168.1.3 (client), you should run the commands as following:

server$ rping -s -a 192.168.1.2
client$ rping -c -a 192.168.1.2

Thanks,

--Shachar

  • Shachar, I tried the above suggestion. Seems like its not able to recognize the IP address. – dhara Jul 24 '15 at 23:26
  • server # rping -s -a localhost client # rping -c -a localhost rdma_resolve_route: No such device server # rping -s -a 10.213.41.231 rdma_bind_addr: No such file or directory – dhara Jul 24 '15 at 23:26