0

I'm trying to deploy a ceph cluster on Ubuntu16.04, using images ceph/daemon:latest-luminous. It worked out perfectly at first without using RDMA, and then after I change the configuration and I entered the container inside and input ceph -s, it occured an error as below

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: In function 'void RDMAConnectedSocketImpl::handle_connection()' thread 7f4f12a91700 time 2021-11-08 01:41:37.221315
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.13/rpm/el7/BUILD/ceph-12.2.13/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: 220: FAILED assert(!r)
2021-11-08 01:41:37.221289 7f4f12a91700 -1  RDMAConnectedSocketImpl activate failed to transition to RTR state: (22) Invalid argument
 ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f4f200ad6b0]
 2: (RDMAConnectedSocketImpl::handle_connection()+0xb1b) [0x7f4f2025108b]
 3: (EventCenter::process_events(int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x359) [0x7f4f20238499]
 4: (()+0x42cbce) [0x7f4f2023bbce]
 5: (()+0xb5330) [0x7f4f1e0fb330]
 6: (()+0x7ea5) [0x7f4f2b529ea5]
 7: (clone()+0x6d) [0x7f4f2ab49b0d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I don't know why this happened, because I tried to use rping to test the RDMA connection and it worked well. Here is my configuration, is there anything wrong with it?

ms_type = async+rdma
ms_cluster_type = async+rdma
ms_async_rdma_device_name = mlx5_0
ms_async_rdma_polling_us = 0
ms_async_rdma_local_gid=fe80:0000:0000:0000:ec0d:9a03:00ca:31d8

0 Answers0