There are several projects/attempts to add interfaces to memory-to-memory DMA Engines intended for use in HPS (mpi):
KNEM may use I/OAT Intel DMA engine on some microarchitectures and sizes
I/OAT copy offload through DMA Engine
One interesting asynchronous feature is certainly I/OAT copy offload.
icopy.flags = KNEM_FLAG_DMA;
Some authors say that it have no benefits of hardware DMA Engine on newer Intel microarchitectures:
http://www.ipdps.org/ipdps2010/ipdps2010-slides/CAC/slides_cac_Mor10OptMPICom.pdf
I/OAT only useful for obsolete architectures
CMA was announced as similar project to knem: http://www.open-mpi.org/community/lists/devel/2012/01/10208.php
These system calls were designed to permit fast message passing by
allowing messages to be exchanged with a single copy operation
(rather than the double copy that would be required when using, for
example, shared memory or pipes).
If you can, you should not use sockets (especially tcp sockets) to transfer data, they have high software overhead which is not needed when you are working on single machine. Standard skb
size limit may be too small to use I/OAT effectively, so network stack probably will not use I/OAT.