4

I have a block device driver which is working, after a fashion. It is for a PCIe device, and I am handling the bios directly with a make_request_fn rather than use a request queue, as the device has no seek time. However, it still has transaction overhead.

When I read consecutively from the device, I get bios with many segments (generally my maximum of 32), each consisting of 2 hardware sectors (so 2 * 2k) and this is then handled as one scatter-gather transaction to the device, saving a lot of signaling overhead. However on a write, the bios each have just one segment of 2 sectors and therefore the operations take a lot longer in total. What I would like to happen is to somehow cause the incoming bios to consist of many segments, or to merge bios sensibly together myself. What is the right approach here?

The current content of the make_request_fn is something along the lines of:

  • Determine read/write of the bio
  • For each segment in the bio, make an entry in a scatterlist* with sg_set_page
  • Map this scatterlist to PCI with pci_map_sg
  • For every segment in the scatterlist, add to a device-specific structure defining a multiple-segment DMA scatter-gather operation
  • Map that structure to DMA
  • Carry out transaction
  • Unmap structure and SG DMA
  • Call bio_endio with -EIO if failed and 0 if succeeded.

The request queue is set up like:

#define MYDEV_BLOCK_MAX_SEGS 32
#define MYDEV_SECTOR_SIZE 2048

blk_queue_make_request(mydev->queue, mydev_make_req);

set_bit(QUEUE_FLAG_NONROT, &mydev->queue->queue_flags);
blk_queue_max_segments(mydev->queue, MYDEV_BLOCK_MAX_SEGS);
blk_queue_physical_block_size(mydev->queue, MYDEV_SECTOR_SIZE);
blk_queue_logical_block_size(mydev->queue, MYDEV_SECTOR_SIZE);

blk_queue_flush(mydev->queue, 0);

blk_queue_segment_boundary(mydev->queue, -1UL);
blk_queue_max_segments(mydev->queue, MYDEV_BLOCK_MAX_SEGS);
blk_queue_dma_alignment(mydev->queue, 0x7);
Inductiveload
  • 6,094
  • 4
  • 29
  • 55

0 Answers0