I've written some Linux device drivers but I am still at the level of newbie hack. I can get them working but that's all I can claim. So far, I've been able to work them into a model of write data using write() and read data using read(). I occasionally use ioctl for more fine-tuned control.
Now I want to build a coprocessing block in FPGA logic and write a device driver for the ARM processor in that same FPGA to offload work from the ARM to the FPGA. I'm having a hard time working out how best to design this interface.
If access to the coprocessor was exclusive, data could be written to the driver, the processing would happen in the FPGA fabric, and the data would be retrieved with a call to read. However, exclusive access to the coprocessing hardware would be a waste. Ideally any user space process could use the hardware if it's available. I believe it would work if policy required user space processes to open the device, write data, read results then close the file. It seems like the overhead of opening and closing the file each time the coprocessor needs to be accessed offsets the benefit of offloading the work in the first place.
I understand that there is a world of issues to be dealt with inside the device driver code to safely handle multiple access to the hardware. But just from a high level, I would love to see a concept that would make this interface work and adhere to good practices for Linux device drivers.
Temporarily sweeping aside all complications the ideal seems like a system where any process can open the device and have an access point where data is written to the device, perhaps in a blocking call, and data is read after the coprocessor does it's magic. The driver would handle the hardware accesses and the calling processes can keep the device file open for as long as it's needed. Absolutely any insights or guidance would be greatly appreciated!
This is all extra information in case anyone cares or it's somehow useful or interesting:
This particular FPGA is a Zynq device from Xilinx. It has a dual-core Cortex ARM A9 on the same silicon as the FPGA fabric (which is based on their Kintex family). The system is running Arch Linux for ARM and has done so quite beautifully for a year now. I use the generic name "coprocessor hardware" because the idea is that this chunk of hardware will gain capability over time while the user-space interface to it's device driver remains fairly constant. You will be able to, for example, write 1024 samples and have this block perform a low-pass filtering operation, an FFT, etc and get the results faster than the processor could have done so on it's own.
Thank you! This is my first question here so I apologize for breaches of protocol and inherent ignorance.
--Tim