I am writing a block device driver for linux. It is crucial to support unsafe removal (like usb unplug). In other words, I want to be able to shut down the block device without creating memory leaks / crashes even while applications hold open files or performing IO on my device or if it is mounted with file system. Surely unsafe removal would possibly corrupt the data which is stored on the device, but that is something the customers are willing to accept.
Here is the basics steps I have done:
- Upon unsafe removal, block device spawns a zombie which will automatically fail all new IO requests, ioctls, etc. The zombie substitutes make_request function and changes other function pointers so kernel would not need the original block device.
- Block device waits for all IO which is running now (and use my internal resources) to complete
- It does del_gendisk(); however this does not really free's kernel resources because they are still used.
- Block device frees itself.
- The zombie keeps track of the amount of opens() and close() on the block device and when last close() occurs it automatically free() itself
- Result - I am not leaking the blockdevice, request queue, gen disk, etc.
However this is a very difficult mechanism which requires a lot of code and is extremely prone to race conditions. I am still struggling with corner cases, per_cpu counting of io's and occasional crashes
My questions: Is there a mechanism in the kernel which already does that? I searched manuals, literature, and countless source code examples of block device drivers, ram disks and USB drivers but could not find a solution. I am sure, that I am not the first one to encounter this problem.
Edited: I learned from the answer below, by Dave S about the hot-plug mechanism but it does not help me. I need a solution of how to safely shut down the driver and not how to notify the kernel that driver was shut down.
Example of one problem: blk_queue_make_request() registers a function through which my block devices serves IO. In that function I increment per_cpu counters to know how many IO's are in flight by each cpu. However there is a race condition of function being called but counter was not increased yet, so my device thinks there are 0 IO's, releases the resources and then IO comes and crashes the system. Hotplug will not assist me with this problem as far as I understand