0

I have two computers both with infiniband cards in them that I would like to connect without a switch. I have a cable connecting the two computers through their QSFP ports.

I have read the documentation and I see that opensm allows this. I have gotten thus far on node2 where I wan to run the software switch. I can ping the ib0 address. Now I need to be able to bring up the software switch but I don't know how to modify the two files:

 1. /etc/sysconfig/opensm
 2. /etc/rdma/opensm.conf

Then I need to understand how to tell node1 where the opensm switch is?

[idf@node2 ~]$ ibv_devinfo
hca_id: mlx4_0
    transport:          InfiniBand (0)
    fw_ver:             2.7.200
    node_guid:          0025:90ff:ff1a:0070
    sys_image_guid:         0025:90ff:ff1a:0073
    vendor_id:          0x02c9
    vendor_part_id:         26428
    hw_ver:             0xB0
    board_id:           SM_2092000001000
    phys_port_cnt:          1
        port:   1
            state:          PORT_DOWN (1)
            max_mtu:        4096 (5)
            active_mtu:     4096 (5)
            sm_lid:         0
            port_lid:       0
            port_lmc:       0x00
            link_layer:     InfiniBand


[idf@node2 ~]$ sudo service opensm status
Redirecting to /bin/systemctl status  opensm.service
opensm.service - Starts the OpenSM InfiniBand fabric Subnet Manager
   Loaded: loaded (/usr/lib/systemd/system/opensm.service; enabled)
   Active: active (running) since Mon 2015-04-20 20:51:10 EDT; 1h 21min ago
     Docs: man:opensm
  Process: 842 ExecStart=/usr/libexec/opensm-launch (code=exited, status=0/SUCCESS)
 Main PID: 846 (opensm-launch)
   CGroup: /system.slice/opensm.service
           \u251c\u2500846 /bin/bash /usr/libexec/opensm-launch
           \u2514\u2500847 /usr/sbin/opensm

Apr 20 20:51:11 node2.synctrading opensm-launch[842]: Log File: /var/log/opensm.log
Apr 20 20:51:11 node2.synctrading opensm-launch[842]: -------------------------------------------------
Apr 20 20:51:11 node2.synctrading opensm-launch[842]: OpenSM 3.3.18
Apr 20 20:51:11 node2.synctrading OpenSM[847]: /var/log/opensm.log log file opened
Apr 20 20:51:11 node2.synctrading OpenSM[847]: OpenSM 3.3.18
Apr 20 20:51:12 node2.synctrading opensm-launch[842]: Using default GUID 0x2590ffff1a0071
Apr 20 20:51:12 node2.synctrading opensm-launch[842]: Entering DISCOVERING state
Apr 20 20:51:12 node2.synctrading OpenSM[847]: Entering DISCOVERING state
Apr 20 20:51:12 node2.synctrading OpenSM[847]: SM port is down
Apr 20 20:51:12 node2.synctrading opensm-launch[842]: SM port is down
[idf@node2 ~]$ 

[idf@node2 ~]$ sudo /etc/sysconfig/network-scripts/ifup-ib ib0

[idf@node2 ~]$ ifconfig -a

ib0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 2044
        inet 192.168.0.1  netmask 255.255.255.0  broadcast 192.168.0.255
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
        infiniband 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 33  bytes 3222 (3.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 33  bytes 3222 (3.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[idf@node2 ~]$ ibstat
CA 'mlx4_0'
    CA type: MT26428
    Number of ports: 1
    Firmware version: 2.7.200
    Hardware version: b0
    Node GUID: 0x002590ffff1a0070
    System image GUID: 0x002590ffff1a0073
    Port 1:
        State: Down
        Physical state: Polling
        Rate: 10
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x0259086a
        Port GUID: 0x002590ffff1a0071
        Link layer: InfiniBand

[idf@node2 ~]$ sudo ibhosts 
Ca  : 0x002590ffff1a0070 ports 1 "node2 mlx4_0"
Ivan
  • 299
  • 1
  • 4
  • 13
  • I think you installed opensm correctly, and there's another problem. The port should show as PORT_INIT if there's a physical connection, even without an SM. – haggai_e Apr 22 '15 at 02:49
  • Yeah, not sure...I may need to upgrade my firmware... – Ivan Apr 23 '15 at 18:13

0 Answers0