Is it possible, in a conceptual level, to realize distribute computing in an assembly/compiled code level?

Question

The goal of this question is to ask if it is possible to have some compiled code (think in terms of an ordinary program, not necessarily written in any special way [e.g multi-threaded] or in any particular paradigm/language) being sent through the network to be somehow processed in a cpu located in another machine.

Ok, so this involves a lot of concepts, and i am not particularly familiar with neither distributed computing nor kernel/OS concepts, so pardon me if this question seems too broad or too unfocused; i will do my best to stay on track.

Let's say we have assembly code (instructions) for a function in our code. It's a simple function that takes x and outputs y by adding 1 to x. I know that at the execution level, the CPU needs to fetch the value of x, move it into a CPU register, perform the addition and then perform a RET instruction at the end.

Would it be possible, conceptually, to pass along through the network the instructions alongside any contextual information needed for execution? If so, what would be the necessary information? The initial state of the CPU registers and instructions, or even more information?

I guess the kernel would be deeply involved in the coordination of such process, but what i am mostly struggling to realize is what would be the minimum 'package' of information i would need to assemble into a message so a computer in the other end of the network would be able to perform the simple calculation OR if this just simply makes no sense given the restrictions of PC architectures.

There is a lot of information about distributed computing out there, but it mostly takes for granted that the code is designed in a specific way. I am interested in a similar solution for any already existant code.

It's very complicated to even try to answer to this question as it's very vague... Are we talking about moving a single function to run elsewhere? Is this a function that is currently running? Or we want to transparently marshal a function call so that it runs elsewhere? After the function call, is the control flow to return back to the original machine, or the process should just go on the new one? How multiple threads should be handled then? Is this function expected to be pure? How should IO work? Of what architecture are we talking about? Run-of-the-mill current day Linux on x86? Micros? — Matteo Italia, Jan 05 '20 at 02:53
If you know target OS and architecture, you can compile a dynamic library in runtime (dll for Windows, so for Linux, dylib for osx) that exports a function with specific exported name and prototype, e.g. `int __cdecl compute(int x)`. Send that DLL to the server. On the server, load the DLL and call that function. — Soonts, Jan 05 '20 at 02:57
this question is way too broad, but yes it is very possible. what the minimal is depends heavily on the code, you would need to know target, x86 is insufficient generally would need more detail, very likely operating system, version, kernel version if applicable, etc. library dependencies. Pretty much everything you need to run on your machine you would need to know to run on a remote machine plus more since a binary on your machine ideally didnt get created/installed without the installer or build testing for dependencies. — old_timer, Jan 05 '20 at 03:37
on linux you could use the package management system to discover what was there based on the dependencies for the application. — old_timer, Jan 05 '20 at 03:38
or use a target independent language, java, python, etc. then you only need to know if the interpreter/virtual machine is there. — old_timer, Jan 05 '20 at 03:38

Matteo Italia · Answer 1 · 2020-01-05T03:22:43.077

The description you give is very vague, so I'm speculating, but if the objective is "take random existing native code and move it around the network transparently" the only possibility is to copy around the whole process, which is quite similar to a fork on Unix-derived operating systems - except that the new process is to be run on another machine.

fork essentially creates a complete clone of the currently running process, so the new process has its own copy of everything - private memory, opened file descriptors, memory mapped files & co. This can be made efficient locally (by copying memory only on demand when it's actually changed), but in the remote case you have to actually copy and send all the stuff; as for file mappings & co, that would be a bad cat to herd, because you'd have to have the same file system (in the same state) on the other side, and for stuff like pipes the OS would have to transparently replace them by sockets or something. This stuff is already complex locally, remotely it would be a nightmare.

This is necessary because at this level you lost pretty much all high level information - functions are mostly a convention, code can just jump around and do whatever it pleases with memory. Even assuming the code follows some calling convention, you have no way to know how many arguments are there, their types and, if an input argument is a pointer, how big is the logical size of the block it points to (that you'd have to marshal as well).

OTOH, if you put some limits at what the "remotizable" code can do, the problem becomes more tractable. If we can assume that:

the code is self-contained (no random jumps around, possibly all packed in a shared object/dll) and relocatable/position-independent;
no global state is used (including open files/sockets)
arguments to these remotizable functions are made known to the runtime, as in, it knows how to serialize/deserialize them

then a more surgical approach can be easily implemented - and has been done in many ways in the past (think DCOM). But this is all but the "transparent" way you seem to think of in the OP.

fun fact: Mosix / OpenMosix (Linux single-system-image clustering) worked basically this way. It was less stable than mainline Linux. (Especially on SMP systems; the devs didn't have any SMP machines to test on.) But it did actually work. When I was a sysadmin for a phylogenetics research group in ~2003 to 2006, I used it on one of our older 10-node clusters. — Peter Cordes, Jan 05 '20 at 13:09
Aha, I was always curious about how it worked, but never really went in depth about it. The main difference with the approach I imagined (remotizing back syscalls on the home node) makes a lot of sense - having to synchronize kernel data structures would be a mess. — Matteo Italia, Jan 05 '20 at 13:39

score 1 · Answer 2 · answered Jan 05 '20 at 03:52

This is essentially what https://en.wikipedia.org/wiki/OpenMosix did: transparently migrate processes to other cluster nodes, making a cluster act kind of like a single system with many cores. (Development stopped in 2008).

It worked by suspending a process and sending all its mapped memory over the network to another node, where it would run. It had a mechanism for system calls to run on the home node so the whole process didn't have to migrate back just for that, e.g. copying the memory involved in read or write system calls.

All of this is fully transparent to the program; it lets you run single-threaded programs on a cluster easily, without needing a cluster job scheduler like Grid Engine. But it doesn't help a multi-threaded program take advantage of the CPUs in more than one node at once. It's too coarse-grained for that. (That's one of the major reasons why OpenMosix was abandoned.)

Is it possible, in a conceptual level, to realize distribute computing in an assembly/compiled code level?

2 Answers2