3

I'm having trouble with NFS over RoCE on Ubuntu 16.04 using the latest OFED package provided be Mellanox (MLNX_OFED_LINUX-3.3-1.0.4.0-ubuntu16.04-x86_64.tgz). My cards are Mellanox 10Gbe and are RoCE v1 enabled.

Works with Inbox drivers/software but not no much with latest OFED

I managed to get NFS working with RoCE by following the docs on this site using the Inbox drivers/software (included with Ubuntu 16.04). I was having some little issues and I know the Ubuntu stuff is quite out of date so I wanted to install the latest OFED/mlx4 drivers, etc... as per recommendations on mellanox.com. So I did that. All went as planned. IP functionality is all there and RDMA tools/tests all work. Everything seems to work great. Except one thing.

The svcrdma and xprtrdma modules won't load. So no RDMA support for NFS for me. I get the following errors. I also get the same error if I install only the latest mlx4 driver from the Mellanox site and leave the rest of the packages alone.

I have a feeling this can be resolved somehow - like by recompiling kernel modules and such but that is over my head at the moment. Or maybe I just messed something up (crossing fingers)? Can anyone help?

Someone commented on this Mellanox community article that they had the same issue with Ubuntu 14.04: https://community.mellanox.com/docs/DOC-2132 According to the same document, it should work just fine with CentOS 7. What's the difference?

The end result I want is to have the latest driver and software (preferably) working on Ubuntu 16.04 with NFS over RoCE. If not the latest OFED package, at least the latest mlx4 driver. I read somewhere that newer kernel versions will have updated drivers and RDMA code (I forgot most of what I read). If this goes nowhere, my answer may have to be to wait for a newer Ubuntu release.

Thanks

Error messages when loading modules

NFS server:

# modprobe svcrdma 
modprobe: ERROR: could not insert 'rpcrdma': Invalid argument 

dmesg errors:

[105699.696980] rpcrdma: Unknown symbol rdma_event_msg (err 0) 
[105699.697056] rpcrdma: disagrees about version of symbol ib_create_cq 
[105699.697059] rpcrdma: Unknown symbol ib_create_cq (err -22) 
[105699.697069] rpcrdma: disagrees about version of symbol rdma_resolve_addr 
[105699.697071] rpcrdma: Unknown symbol rdma_resolve_addr (err -22) 
[105699.697183] rpcrdma: Unknown symbol ib_event_msg (err 0) 
[105699.697213] rpcrdma: disagrees about version of symbol ib_dereg_mr 
[105699.697215] rpcrdma: Unknown symbol ib_dereg_mr (err -22) 
[105699.697224] rpcrdma: disagrees about version of symbol ib_query_qp 
[105699.697226] rpcrdma: Unknown symbol ib_query_qp (err -22) 
[105699.697236] rpcrdma: disagrees about version of symbol rdma_disconnect 
[105699.697238] rpcrdma: Unknown symbol rdma_disconnect (err -22) 
[105699.697245] rpcrdma: disagrees about version of symbol ib_alloc_fmr 
[105699.697247] rpcrdma: Unknown symbol ib_alloc_fmr (err -22) 
[105699.697294] rpcrdma: disagrees about version of symbol ib_dealloc_fmr 
[105699.697295] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22) 
[105699.697301] rpcrdma: disagrees about version of symbol rdma_resolve_route 
[105699.697303] rpcrdma: Unknown symbol rdma_resolve_route (err -22) 
[105699.697398] rpcrdma: disagrees about version of symbol rdma_bind_addr 
[105699.697400] rpcrdma: Unknown symbol rdma_bind_addr (err -22) 
[105699.697441] rpcrdma: disagrees about version of symbol rdma_create_qp 
[105699.697443] rpcrdma: Unknown symbol rdma_create_qp (err -22) 
[105699.697479] rpcrdma: Unknown symbol ib_map_mr_sg (err 0) 
[105699.697487] rpcrdma: disagrees about version of symbol ib_destroy_cq 
[105699.697489] rpcrdma: Unknown symbol ib_destroy_cq (err -22) 
[105699.697494] rpcrdma: disagrees about version of symbol rdma_create_id 
[105699.697496] rpcrdma: Unknown symbol rdma_create_id (err -22) 
[105699.697582] rpcrdma: disagrees about version of symbol rdma_listen 
[105699.697584] rpcrdma: Unknown symbol rdma_listen (err -22) 
[105699.697587] rpcrdma: disagrees about version of symbol rdma_destroy_qp 
[105699.697589] rpcrdma: Unknown symbol rdma_destroy_qp (err -22) 
[105699.697597] rpcrdma: disagrees about version of symbol ib_query_device 
[105699.697599] rpcrdma: Unknown symbol ib_query_device (err -22) 
[105699.697606] rpcrdma: disagrees about version of symbol ib_get_dma_mr 
[105699.697607] rpcrdma: Unknown symbol ib_get_dma_mr (err -22) 
[105699.697617] rpcrdma: disagrees about version of symbol ib_alloc_pd 
[105699.697618] rpcrdma: Unknown symbol ib_alloc_pd (err -22) 
[105699.697673] rpcrdma: Unknown symbol ib_alloc_mr (err 0) 
[105699.697734] rpcrdma: disagrees about version of symbol rdma_connect 
[105699.697736] rpcrdma: Unknown symbol rdma_connect (err -22) 
[105699.697769] rpcrdma: Unknown symbol ib_wc_status_msg (err 0) 
[105699.697842] rpcrdma: disagrees about version of symbol rdma_destroy_id 
[105699.697844] rpcrdma: Unknown symbol rdma_destroy_id (err -22) 
[105699.697872] rpcrdma: disagrees about version of symbol rdma_accept 
[105699.697874] rpcrdma: Unknown symbol rdma_accept (err -22) 
[105699.697882] rpcrdma: disagrees about version of symbol ib_destroy_qp 
[105699.697883] rpcrdma: Unknown symbol ib_destroy_qp (err -22) 
[105699.697964] rpcrdma: disagrees about version of symbol ib_dealloc_pd 
[105699.697965] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

NFS client:

# modprobe xprtrdma          
modprobe: ERROR: could not insert 'rpcrdma': Invalid argument

dmesg errors:

[106055.692454] rpcrdma: Unknown symbol rdma_event_msg (err 0) 
[106055.692480] rpcrdma: disagrees about version of symbol ib_create_cq 
[106055.692481] rpcrdma: Unknown symbol ib_create_cq (err -22) 
[106055.692484] rpcrdma: disagrees about version of symbol rdma_resolve_addr 
[106055.692485] rpcrdma: Unknown symbol rdma_resolve_addr (err -22) 
[106055.692520] rpcrdma: Unknown symbol ib_event_msg (err 0) 
[106055.692529] rpcrdma: disagrees about version of symbol ib_dereg_mr 
[106055.692530] rpcrdma: Unknown symbol ib_dereg_mr (err -22) 
[106055.692532] rpcrdma: disagrees about version of symbol ib_query_qp 
[106055.692533] rpcrdma: Unknown symbol ib_query_qp (err -22) 
[106055.692536] rpcrdma: disagrees about version of symbol rdma_disconnect 
[106055.692536] rpcrdma: Unknown symbol rdma_disconnect (err -22) 
[106055.692538] rpcrdma: disagrees about version of symbol ib_alloc_fmr 
[106055.692539] rpcrdma: Unknown symbol ib_alloc_fmr (err -22) 
[106055.692552] rpcrdma: disagrees about version of symbol ib_dealloc_fmr 
[106055.692553] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22) 
[106055.692554] rpcrdma: disagrees about version of symbol rdma_resolve_route 
[106055.692555] rpcrdma: Unknown symbol rdma_resolve_route (err -22) 
[106055.692565] rpcrdma: disagrees about version of symbol rdma_bind_addr 
[106055.692565] rpcrdma: Unknown symbol rdma_bind_addr (err -22) 
[106055.692573] rpcrdma: disagrees about version of symbol rdma_create_qp 
[106055.692574] rpcrdma: Unknown symbol rdma_create_qp (err -22) 
[106055.692583] rpcrdma: Unknown symbol ib_map_mr_sg (err 0) 
[106055.692585] rpcrdma: disagrees about version of symbol ib_destroy_cq 
[106055.692585] rpcrdma: Unknown symbol ib_destroy_cq (err -22) 
[106055.692587] rpcrdma: disagrees about version of symbol rdma_create_id 
[106055.692587] rpcrdma: Unknown symbol rdma_create_id (err -22) 
[106055.692613] rpcrdma: disagrees about version of symbol rdma_listen 
[106055.692614] rpcrdma: Unknown symbol rdma_listen (err -22) 
[106055.692615] rpcrdma: disagrees about version of symbol rdma_destroy_qp 
[106055.692615] rpcrdma: Unknown symbol rdma_destroy_qp (err -22) 
[106055.692617] rpcrdma: disagrees about version of symbol ib_query_device 
[106055.692618] rpcrdma: Unknown symbol ib_query_device (err -22) 
[106055.692619] rpcrdma: disagrees about version of symbol ib_get_dma_mr 
[106055.692620] rpcrdma: Unknown symbol ib_get_dma_mr (err -22) 
[106055.692622] rpcrdma: disagrees about version of symbol ib_alloc_pd 
[106055.692623] rpcrdma: Unknown symbol ib_alloc_pd (err -22) 
[106055.692638] rpcrdma: Unknown symbol ib_alloc_mr (err 0) 
[106055.692657] rpcrdma: disagrees about version of symbol rdma_connect 
[106055.692658] rpcrdma: Unknown symbol rdma_connect (err -22) 
[106055.692668] rpcrdma: Unknown symbol ib_wc_status_msg (err 0) 
[106055.692690] rpcrdma: disagrees about version of symbol rdma_destroy_id 
[106055.692690] rpcrdma: Unknown symbol rdma_destroy_id (err -22) 
[106055.692698] rpcrdma: disagrees about version of symbol rdma_accept 
[106055.692699] rpcrdma: Unknown symbol rdma_accept (err -22) 
[106055.692701] rpcrdma: disagrees about version of symbol ib_destroy_qp 
[106055.692701] rpcrdma: Unknown symbol ib_destroy_qp (err -22) 
[106055.692724] rpcrdma: disagrees about version of symbol ib_dealloc_pd 
[106055.692725] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)
Ryan Babchishin
  • 6,260
  • 2
  • 17
  • 37

0 Answers0