We are using a custom board running version 3.2 of the kernel, but we've come across some issues when testing the robustness of the MMC layer.
First of all, our MMC slot does not have a card detection pin. The issue in question consists of the following:
- Load the module (omap_hsmmc). The card is detected on power-up, and is mounted appropriately.
- Read something from the SD card (i.e.
cat foo.txt
) - While the file is being read, remove the card.
- After many failed attempts, the system hangs.
Now, I've tracked the issue to the following code section, in drivers/mmc/card/queue.c
:
static int mmc_queue_thread(void *d)
{
struct mmc_queue *mq = d;
struct request_queue *q = mq->queue;
current->flags |= PF_MEMALLOC;
down(&mq->thread_sem);
do {
struct request *req = NULL;
struct mmc_queue_req *tmp;
spin_lock_irq(q->queue_lock);
set_current_state(TASK_INTERRUPTIBLE);
req = blk_fetch_request(q);
mq->mqrq_cur->req = req;
spin_unlock_irq(q->queue_lock);
if (req || mq->mqrq_prev->req) {
set_current_state(TASK_RUNNING);
mq->issue_fn(mq, req);
} else {
if (kthread_should_stop()) {
set_current_state(TASK_RUNNING);
break;
}
up(&mq->thread_sem);
schedule();
down(&mq->thread_sem);
}
/* Current request becomes previous request and vice versa. */
mq->mqrq_prev->brq.mrq.data = NULL;
mq->mqrq_prev->req = NULL;
tmp = mq->mqrq_prev;
mq->mqrq_prev = mq->mqrq_cur;
mq->mqrq_cur = tmp;
} while (1);
up(&mq->thread_sem);
return 0;
}
Investigating this code, I've found that it hangs inside the mq->issue_fn(mq, req)
call. This function prepares and issues the appropriate commands to fufill the request passed into it, and it is aware of any errors that happens when it can't communicate with the card. The error is handled in an awkward (in my opinion) manner, and does not 'bubble up' to the mmc_queue_thread
. However, I've checked my version's code against that of the most recent kernel release (4.9), and I did not see any difference in the treatment of such errors, apart from a better separation of each error case (the treatment is very similar).
I suspect that the issue is caused by the inability ofthe upper layers to stop making new requests to read from the MMC card.
What I've Tried:
- Rewrote the code so that the error is passed up so that I can do
ret = mq->issue_fn(mq, req)
. - Being able to identify the specific errors, I tried to cancel the thread in different ways: calling
kthread_stop
, callingmmc_queue_suspend
,__blk_end_request
, and others. With these, the most I've been able to accomplish was keep the thread in a "harmless" state, where it still existed, but did not consume any resources. However, the user space program that triggered the call does not return, locking up in an uninterruptible state.
My questions:
- Is the best way to solve this by stopping the upper layers from making new requests? Or should the thread itself be 'killed off'?
- In a case such as mine, should it be assumed that because there is no card detection pin, the card should not be removed?
Update: I've found out that you can tell the driver to use polling to detect card insertion/removal. It can be done by adding the MMC_CAP_NEEDS_POLL
flag to the mmc's .caps
on driver initialization (on newer kernels, you can use the broken-cd
property on your DT). However, the problem still happens after this modification.