1

Does anyone know if it is possible to read from a socket directly into a location in a memory-mapped .NET file? This happens to be a non-persistent memory-mapped file (purely in memory, no associated disk file), if that helps.

Context: I'm trying to implement minimal-copying code in a situation where I have large memory-mapped objects to move from node to node in a distributed system. I'm using UDP via the standard socket library, so I had hoped to be able to read, say, 64KB off a socket (I'm on an Infiniband network, so the MTU can be quite large) into a location at some offset into the memory-mapped region.

So far everything I'm finding seems to involve doing a copy operation -- reading my data first, then copying it into the memory-mapped file. So that's what I would like to avoid: that copy operation. In fact I have the same issue for sending: I would like to send directly from the memory-mapped file.

While copying may seem minor, the objects I'm working with are huge (a cloud scenario): massive numbers of multi-gigabyte memory-mapped objects, which I would like to treat as byte arrays. So those extra copying operations could be expensive for me. In fact my real goal is to use Infiniband verbs and do direct DMA transfer from a memory-mapped region on machine A to a memory-mapped region on machine B, bypassing UDP entirely.

Any pointers would be VERY appreciated!

(More details: These are applications on big clusters, 64-bit machines, and while my code is in .NET written in C#, the applications creating these memory-mapped objects are mostly in C++ or other languages -- think of them as Hadoop or Mapreduce tasks, for example, and the files as huge images, or concatenated web pages, stuff like that. So those applications produce these files -- maybe output from a Map step -- and now they need to be "shuffled" to the right places. This is the specific thing I'm trying to do... ideally with my code still living in .NET/C#, simply because I like C# in .NET. I cross-compile with Mono for Linux... which is actually what they run on these clusters)

Ken Birman
  • 1,088
  • 8
  • 25
  • If you're truly dealing with many gigs of data in a single object, you're going to have to deal with them in a streaming fashion from socket to disk anyway; a single object in a .NET process can be no larger than 2GB, and in a 32-bit environment that's as large as the entire process, executing code and all, is allowed to be. – KeithS Jul 18 '13 at 20:06
  • No disk. Everything is always in memory -- sender side and receiver side too. The cloud applications don't have time to deal with disk latencies so we spread our data out on many nodes, memory-mapped, and operate on it directly in memory. The real applications are coded mostly in languages like C++ but they map their files and I can access them as memory-mapped segments with gigabytes of bytes. My job is to shuffle the stuff around as needed, replicate at high speed, etc. But copying is a killer at these sizes of objects. – Ken Birman Jul 18 '13 at 20:16
  • For a super big thing, I would just see more than one memory-mapped file. I would basically page them in and out of my address space. So lets start with a single 2G object and then we can talk about multiple ones... (Anyhow, I'm on 64-bit machines) – Ken Birman Jul 18 '13 at 20:19
  • The problem is that .NET, being a managed-memory runtime, sharply limits what you're allowed to do with memory pointers. Any code explicitly using variables of a pointer type (e.g. `byte*`, `void*`) is inherently "unsafe" and requires full trust, and there are *still* restrictions imposed by the OS on the memory resources a process is allowed to request. – KeithS Jul 18 '13 at 20:24
  • Instead, if you're going to work in .NET, you're encouraged to work with things the runtime and framework give you to manage these large objects, like MemoryStreams. By choosing to use .NET you've basically given up on screaming performance and are instead focused on maintainability and portability of code. If you have apps in C++ that do this, I'd recommend staying unmanaged unless there's a compelling reason to move into the sandbox. – KeithS Jul 18 '13 at 20:26
  • Ok, restrictions. But as in "can't be done"? Or as in "can only be done in some weird, evasive way"? I've been thinking I might write an unsafe "device driver" in unmanaged C++, for example, that could map the file (seeing it directly) and then talk directly to the native O/S memory-mapper and device drivers. But if I could pull this off entirely in .NET I would prefer to do so. – Ken Birman Jul 18 '13 at 20:26
  • If you're considering writing something low-level enough to get kernel access to the memory-mapper, this is about a dozen levels of abstraction closer to the metal than .NET will ever get you. Probably the closest I've ever seen .NET get to unmanaged memory access like this is when working with COM. So, you could create your unmanaged C++ program for low-level memory reads/writes, and pass in pointers to .NET code that can marshal them into types you can work with inside the sandbox. Again, if you need to get this close to the metal, I strongly recommend staying in unmanaged C++. – KeithS Jul 18 '13 at 20:30
  • Well, that can certainly work -- a memory-mapped file can be shared with an unmanaged application; in fact this is precisely how I was planning to get to them myself, and I might not even need to be able to "see" their contents for most purposes. So I could certainly run a C++ helper app. Kind of a pain -- my code will have to fork this thing off and it will make installation and management of the library kind of awkward.... – Ken Birman Jul 18 '13 at 20:37
  • With SocketAsyncEventArgs http://msdn.microsoft.com/en-us/library/system.net.sockets.socketasynceventargs.aspx specify the buffer you want to use, but, as far as I know you only get a stream with memory mapped files and .NET. Lower-level socket comm needs to write directly to memory, so I don't think you can do what you want with managed memory mapped files. – Peter Ritchie Jul 18 '13 at 22:00
  • I know this is old, but in dotnet core (minimally 3.1, not sure when added) you can get an unmanaged pointer to the memory mapped file from `MemoryMappedViewStream` via it's `SafeHandle` - then convert the pointer into a `byte[]` (can use `Unsafe.As(ptr)` and then use this byte array as your buffer. The data from the socket will read directly into that memory address. This *should* work – mhand Aug 28 '21 at 10:02

1 Answers1

0

Clearly Keith deserves credit for pointing me in the right direction, but to summarize the answer: what would work best for me is to write a little module in C or C++, compile it to create a DLL that would be part of my assembly, and load it (this can be done dynamically or statically; latter is easier of course). I can then call from C# into C and my C code can just do the raw system calls using true addresses. Since memory-mapped files (even the ones with no persistent storage) don't move around due to garbage collection or consolidation, this is an unusually easy case. In fact the C code doesn't even need to learn the address from the C# code -- it can just "remap" the same file, although passing the address should be easy enough. Moreover, this can be just as portable as my original C# code, when you come down to it.

In contrast doing this without a helper procedure or two would be a real pain.

Keith, if you want to repost this under your name, I'll delete my summary and "vote" you to the top (you do deserve the points...)

Ken Birman
  • 1,088
  • 8
  • 25