I have a question. Is it possible to identify in which slot there is a broken GPU card using the UBUNTU operating system? We have a SuperMicro GPU server in which there are about 8 GPU cards for AI computing. Every now and then we go to the server room after we get information from users/department that the card is not visible in 'nvidia-smi' command. These are generally hardware failures. Then we encounter a situation where 7 cards are working properly and unfortunately we have to identify the faulty card by trial and error by pulling it from the server. This is terribly tedious and time consuming, so I am wondering if it is possible to unambiguously identify the slot where the faulty card is located.
Thank you in advance.