39

I'd like to build a C pre-processor / compiler that allows functions to be collected from local and online sources. ie:

#fetch MP3FileBuilder http://scripts.com/MP3Builder.gz
#fetch IpodDeviceReader http://apple.com/modules/MP3Builder.gz

void mymodule_main() {
  MP3FileBuilder(&some_data);
}

That's the easy part.

The hard part is I need a reliable way to "sandbox" the imported code from direct or unrestricted access to disk or system resources (including memory allocation and the stack). I want a way to safely run small snippets of untrusted C code (modules) without the overhead of putting them in separate process, VM or interpreter (a separate thread would be acceptable though).

REQUIREMENTS

  • I'd need to put quotas on its access to data and resources including CPU time.
  • I will block direct access to the standard libraries
  • I want to stop malicious code that creates endless recursion
  • I want to limit static and dynamic allocation to specific limits
  • I want to catch all exceptions the module may raise (like divide by 0).
  • Modules may only interact with other modules via core interfaces
  • Modules may only interact with the system (I/O etc..) via core interfaces
  • Modules must allow bit ops, maths, arrays, enums, loops and branching.
  • Modules cannot use ASM
  • I want to limit pointer and array access to memory reserved for the module (via a custom safe_malloc())
  • Must support ANSI C or a subset (see below)
  • The system must be lightweight and cross-platform (including embedded systems).
  • The system must be GPL or LGPL compatible.

I'm happy to settle for a subset of C. I don't need things like templates or classes. I'm primarily interested in the things high-level languages don't do well like fast maths, bit operations, and the searching and processing of binary data.

It is not the intention that existing C code can be reused without modification to create a module. The intention is that modules would be required to conform to a set of rules and limitations designed to limit the module to basic logic and transformation operations (like a video transcode or compression operations for example).

The theoretical input to such a compiler/pre-processor would be a single ANSI C file (or safe subset) with a module_main function, NO includes or pre-processor directives, no ASM, It would allow loops, branching, function calls, pointer maths (restricted to a range allocated to the module), bit-shifting, bitfields, casts, enums, arrays, ints, floats, strings and maths. Anything else is optional.

EXAMPLE IMPLEMENTATION

Here's a pseudo-code snippet to explain this better. Here a module exceeds it's memory allocation quota and also creates infinite recursion.

buffer* transcodeToAVI_main( &in_buffer ) {
    int buffer[1000000000]; // allocation exceeding quota
    while(true) {} // infinite loop
    return buffer;
}

Here's a transformed version where our preprocessor has added watchpoints to check for memory usage and recursion and wrapped the whole thing in an exception handler.

buffer* transcodeToAVI_main( &in_buffer ) {
    try {
        core_funcStart(__FILE__,__FUNC__); // tell core we're executing this function
        buffer = core_newArray(1000000000, __FILE__, __FUNC__); // memory allocation from quota
        while(true) {
           core_checkLoop(__FILE__, __FUNC__, __LINE__) && break; // break loop on recursion limit
        } 
        core_moduleEnd(__FILE__,__FUNC__);
    } catch {
        core_exceptionHandler(__FILE__, __FUNC__);
    }
    return buffer;
}

I realise performing these checks impact the module performance but I suspect it will still outperform high-level or VM languages for the tasks it is intended to solve. I'm not trying to stop modules doing dangerous things outright, I'm just trying to force those dangerous things to happen in a controlled way (like via user feedback). ie: "Module X has exceeded it's memory allocation, continue or abort?".

UPDATE

The best I've got so far is to use a custom compiler (Like a hacked TCC) with bounds checking and some custom function and looping code to catch recursions. I'd still like to hear thoughts on what else I need to check for or what solutions are out there. I imagine that removing ASM and checking pointers before use solves a lot of the concerns expressed in previous answers below. I added a bounty to pry some more feedback out of the SO community.

For the bounty I'm looking for:

  • Details of potential exploits against the theoretical system defined above
  • Possible optimisations over checking pointers on each access
  • Experimental open-source implementations of the concepts (like Google Native Client)
  • Solutions that support a wide range of OS and devices (no OS/hardware based solutions)
  • Solutions that support the most C operations, or even C++ (if that's possible)

Extra credit for a method that can work with GCC (ie, a pre-processor or small GCC patch).

I'll also give consideration to anyone who can conclusively prove what I'm attempting cannot be done at all. You will need to be pretty convincing though because none of the objections so far have really nailed the technical aspects of why they think it's impossible. In the defence of those who said no this question was originally posed as a way to safely run C++. I have now scaled back the requirement to a limited subset of C.

My understanding of C could be classed as "intermediate", my understanding of PC hardware is maybe a step below "advanced". Try to coach your answers for that level if you can. Since I'm no C expert I'll be going largely based on votes given to an answer as well as how closely the answer comes to my requirements. You can assist by providing sufficient evidence for your claims (respondents) and by voting (everyone else). I'll assign an answer once the bounty countdown reaches 6 hours.

Finally, I believe solving this problem would be a major step towards maintaining C's relevance in an increasingly networked and paranoid world. As other languages close the gap performance-wise and computing power grows it will be harder and harder to justify the added risk of C development (as it is now with ASM). I believe your answers will have a much greater relevance than scoring a few SO points so please contribute what you can, even if the bounty has expired.

SpliFF
  • 38,186
  • 16
  • 91
  • 120
  • 1
    Here's a challenge for that compiler: template void foo(typename A::X ax) { B::Y(sizeof ax()); } – MSalters Jun 11 '09 at 12:41
  • I probably wouldn't allow modules to use templates at all. – SpliFF Jun 11 '09 at 12:43
  • 2
    That is rather like not allowing them to use integers. –  Jun 11 '09 at 12:49
  • 1
    not really, templates are more of a convenience than a necessity. It's my intention that modules be reasonably small rather than large multi-file projects. Each module should do one thing well as in aToB(a) not aTo(a,b). – SpliFF Jun 11 '09 at 13:18
  • 3
    Note that templates and classes don't actually exist in C. – bdonlan Jun 14 '09 at 19:05
  • 1
    @bdonlan: This question started life as a query about safe C++. I've scaled back my expectations since then. – SpliFF Jun 14 '09 at 19:11
  • Your modules may be small so you obviously don't want to spawn a new process each time, however you could re-use an already spawned process to call the checked module many times. You can also migrate your code that calls the module into the process itself. You only need to kill the process when the module violates the rules. Done right, this could be fast, secure, and portable. – Paul Jun 14 '09 at 19:17
  • The main problem with that is I would like modules to share a large data stream and core library. I could pass the stream around via IPC/mmap between processes but I'd prefer the modules be called as functions and have a direct pointer to the shared stream (or some other input). I don't mind if they screw up the stream, it's everything else I want them isolated from (especially things that crash). – SpliFF Jun 14 '09 at 19:22
  • I think mmap gives you the direct pointer to the data stream. As for calling modules as functions, your preprocessor can generate "proxy functions" for each module that sends a command to the sandboxed process to run a given module. For speed reasons, if you're calling a lot of modules in series, you may want your code that calls modules to be in the sandboxed process as well. – Paul Jun 14 '09 at 19:34
  • Check this, they have a sandbox, http://codepad.org/ and http://codepad.org/about talks about their use of virtual machines. – Liran Orevi Jun 17 '09 at 00:53
  • @Liran: Interesting project (also, Geordi which it is based on). The catch with codepad is that it not only puts the code in a seperate process but it puts it on an entirely different computer! I did find some interesting g++ flags on the about page though which I've put in an answer below. – SpliFF Jun 17 '09 at 01:38
  • 1
    I have absolutely nothing of value to add to this thread, except maybe the comment that I think you're truly wonderfully and gloriously barking mad. Trying to rein in pointer arithmetic is the smell of the sound of one hand clapping ;-) I truly wish you the very best with this project, and I relly don't care how useless it is, it's just FUN!... Cheers mate. Keith. – corlettk Jun 20 '09 at 05:43
  • You might have an easier start by writing an extension to Clang. – Joseph Garvin Oct 14 '11 at 15:23
  • This is an old question, but I have to write this: there is no recursion in your example. Also, returned buffer is invalid as it was on stack. – val - disappointed in SE Jun 22 '19 at 13:59

13 Answers13

16

Since the C standard is much too broad to be allowed, you would need to go the other way around: specify the minimum subset of C which you need, and try to implement that. Even ANSI C is already too complicated and allows unwanted behaviour.

The aspect of C which is most problematic are the pointers: the C language requires pointer arithmitic, and those are not checked. For example:

char a[100];
printf("%p %p\n", a[10], 10[a]);

will both print the same address. Since a[10] == 10[a] == *(10 + a) == *(a + 10).

All these pointer accesses cannot be checked at compile time. That's the same complexity as asking the compiler for 'all bugs in a program' which would require solving the halting problem.

Since you want this function to be able to run in the same process (potentially in a different thread) you share memory between your application and the 'safe' module since that's the whole point of having a thread: share data for faster access. However, this also means that both threads can read and write the same memory.

And since you cannot prove compile time where pointers end up, you have to do that at runtime. Which means that code like 'a[10]' has to be translated to something like 'get_byte(a + 10)' at which point I wouldn't call it C anymore.

Google Native Client

So if that's true, how does google do it then? Well, in contrast to the requirements here (cross-platform (including embedded systems)), Google concentrates on x86, which has in additional to paging with page protections also segment registers. Which allows it to create a sandbox where another thread does not share the same memory in the same way: the sandbox is by segmentation limited to changing only its own memory range. Furthermore:

  • a list of safe x86 assembly constructs is assembled
  • gcc is changed to emit those safe constructs
  • this list is constructed in a way that is verifiable.
  • after loading a module, this verification is done

So this is platform specific and is not a 'simple' solution, although a working one. Read more at their research paper.

Conclusion

So whatever route you go, you need to start out with something new which is verifiable and only then you can start by adapting an existing a compiler or generating a new one. However, trying to mimic ANSI C requires one to think about the pointer problem. Google modelled their sandbox not on ANSI C but on a subset of x86, which allowed them to use existing compilers to a great extend with the disadvantage of being tied to x86.

Bugster
  • 1,475
  • 1
  • 12
  • 17
Rutger Nijlunsing
  • 4,861
  • 1
  • 21
  • 24
  • Nice reply. I assume though that since pointers are resolved at runtime there should be some kind of "pointer operation" going on. I was hoping to replace/hook that operation similar to what you describe with get_byte() being a sister function to safe_malloc() so pointers are always constrained to the module's allocation. I realise there is a performance penalty involved but it shouldn't be outrageous (check if address X is in a given range before continuing). I probably need to do this check in ASM to keep the performance impact low. – SpliFF Jun 14 '09 at 18:48
  • I guess some compilers might "cheat" if they can establish a pointer address at compile time. I'll need to check this out. – SpliFF Jun 14 '09 at 18:48
  • Ideally I would only check pointers when they are assigned or modified, however I guess this would involve catching all pointer arithmetic including pointer-to-a-pointer chains. Is there a difference in the complexity (in the compiler) of knowing when a pointer is set vs. knowing when it is used? – SpliFF Jun 14 '09 at 19:00
  • "I was hoping to replace/hook that operation similar to what you describe with get_byte() being a sister function to safe_malloc() so pointers are always constrained to the module's allocation.": I think pointer access is too 'easy' to do in assembly (in contrast to malloc) which means that gcc will probably not have isolated this to be wrapped. – Rutger Nijlunsing Jun 21 '09 at 18:11
  • If indeed gcc does not have it wrapped which I suspect, you might be able to 'transform' the generated assembly by substituting each pointer access to a function which does the check. But this is going to be expensive. That's why Google Native Client uses the segmentation of x86 CPUs to do this. – Rutger Nijlunsing Jun 21 '09 at 18:13
  • "Ideally I would only check pointers when they are assigned or modified": A lot of pointers in C are constructed on-the-fly, like a[i] where i is a variable. Since those do appear in tight loops it is not going to gain you much. – Rutger Nijlunsing Jun 21 '09 at 18:15
10

I think you would get a lot out of reading about some of the implementation concerns and choices Google made when designing Native Client, a system for executing x86 code (safely, we hope) in the browser. You may need to do some source-rewriting or source-to-source compilation to make the code safe if it's not, but you should be able to rely on the NaCL sandbox to catch your generated assembly code if it tries to do anything too funky.

Matt J
  • 43,589
  • 7
  • 49
  • 57
  • Native Client looks like a very interesting project. I will certainly have a peek at the code. Thanks. – SpliFF Jun 12 '09 at 06:50
5

If I were going to do this, I would investigate one of two approaches:

  • Use CERN's CINT to run sandboxed code in an interpreter and see about restricting what the interpreter permits. This would probably not give terribly good performance.
  • Use LLVM to create an intermediate representation of the C++ code and then see if it's feasible to run that bytecode in a sandboxed Java-style VM.

However, I agree with others that this is probably a horribly involved project. Look at the problems that web browsers have had with buggy or hung plugins destabilizing the entire browser. Or look at the release notes for the Wireshark project; almost every release, it seems, contains security fixes for problems in one of its protocol dissectors that then affect the entire program. If a C/C++ sandbox were feasible, I'd expect these projects to have latched onto one by now.

Josh Kelley
  • 56,064
  • 19
  • 146
  • 246
5

I stumbled upon Tiny C Compiler (TCC). This may be what I need:

*  SMALL! You can compile and execute C code everywhere, for example on rescue disks (about 100KB for x86 TCC executable, including C preprocessor, C compiler, assembler and linker).
* FAST! tcc generates x86 code. No byte code overhead. Compile, assemble and link several times faster than GCC.
* UNLIMITED! Any C dynamic library can be used directly. TCC is heading torward full ISOC99 compliance. TCC can of course compile itself.
* SAFE! tcc includes an optional memory and bound checker. Bound checked code can be mixed freely with standard code.
* Compile and execute C source directly. No linking or assembly necessary. Full C preprocessor and GNU-like assembler included.
* C script supported : just add '#!/usr/local/bin/tcc -run' at the first line of your C source, and execute it directly from the command line.
* With libtcc, you can use TCC as a backend for dynamic code generation.

It's a very small program which makes hacking on it a viable option (hack GCC?, not in this lifetime!). I suspect it will make an excellent base to build my own restricted compiler from. I'll remove support for language features I can't make safe and wrap or replace the memory allocation and loop handling.

TCC can already do bounds checking on memory accesses, which is one of my requirements.

libtcc is also a great feature, since I can then manage code compilation internally.

I don't expect it to be easy but it gives me hope I can get performance close to C with less risks.

Still want to hear other ideas though.

SpliFF
  • 38,186
  • 16
  • 91
  • 120
  • You need to be a lot clearer if you are addressing C or C++ - they are not the same language. –  Jun 11 '09 at 17:11
  • I had planned on either but it's becoming clear that C++ throws a truckload of new complications into an already diabolical problem. Everything these modules are intended to do is achievable from C (or even a subset of C) so I'm not going to lose any sleep if I can't make it work for C++. – SpliFF Jun 11 '09 at 18:08
  • this is probably the most practical solution. but, given mooore's law, i think you are prematurely optimizing the system by using C. I do believe that java is a better way to go for these sorts of things - take a look at google app engine : it does similar things, but in a completely different way than compiling native code. They achieve speed by using many machines and parallelize. I suspect what you wanted to do is achievable by this means. – Chii Jun 20 '09 at 09:39
5

This isn't trivial, but it's not that hard.

You can run binary code in a sand box. Every operating system does this all day long.

They're going to have to use your standard library (vs a generic C lib). Your standard library will enforce whatever controls you want to impose.

Next, you'll want ensure that they can not create "runnable code" at run time. That is, the stack isn't executable, they can't allocate any memory that's executable, etc. That means that only the code generated by the compiler (YOUR compiler) will be executable.

If your compiler signs its executable cryptographically, your runtime will be able to detect tampered binaries, and simply not load them. This prevents them from "poking" things in to the binaries that you simply don't want them to have.

With a controlled compiler generating "safe" code, and a controlled system library, that should give a reasonably controlled sandbox, even with actual machine language code.

Want to impose memory limits? Put a check in to malloc. Want to restrict how much stack is allocated? Limit the stack segment.

Operating systems create these kinds of constrained environments using their Virtual Memory managers all day long, so you can readily do these things on modern OS's.

Whether the effort to do this is worthwhile vs using an off the shelf Virtual Machine and byte code runtime, I can't say.

Will Hartung
  • 115,893
  • 19
  • 128
  • 203
  • It requires a newer processor for the 'not-executable' bit, and even then, you'd probably require a driver for whatever OS you use cause it probably won't directly support twiddling with the paging table and such important things. Also, if you allow inline assembly, what stops code from doing the system call directly(it's just an int 0x80 away) I think it is impossible to have a completely safe sandbox without machine level virtualization. – Earlz Jun 12 '09 at 04:36
  • Ok now we're making progress. I have no objection to removing libc and adding a custom malloc/free implementation. Static allocation I can measure at compile time and pointer range checking can cooperate with the custom malloc to limit the bounds for pointer referencing. What techniques exist for making the stack 'non-executable' and limiting the 'stack segment'? Is it enough to simply ban inline ASM? – SpliFF Jun 12 '09 at 06:14
  • "You can run binary code in a sand box. Every operating system does this all day long.": That's true, but that is normally called process. And SpliFF doesn't want an extra process, so that won't work. – Rutger Nijlunsing Jun 14 '09 at 17:42
  • "They're going to have to use your standard library (vs a generic C lib). Your standard library will enforce whatever controls you want to impose.": The C include files are handy but do not disable / enable any functionality. You don't need stdio.h or libc to write to stdout, you can do that for example by including your own (adapted) stdio.h – Rutger Nijlunsing Jun 14 '09 at 17:44
  • "Next, you'll want ensure that they can not create "runnable code" at run time. That is, the stack isn't executable, they can't allocate any memory that's executable, etc. That means that only the code generated by the compiler (YOUR compiler) will be executable.": No. In C you can make the stack executable again (undo the non-executable making) and/or change the already existing running code which is executable (...but first change read-only flag on that block of memory). – Rutger Nijlunsing Jun 14 '09 at 17:45
  • "If your compiler signs its executable cryptographically, your runtime will be able to detect tampered binaries, and simply not load them. This prevents them from "poking" things in to the binaries that you simply don't want them to have.": If you change the code while running, nothing is checked or the check is done too late: after the malicious code has run already. Not a safe feeling. – Rutger Nijlunsing Jun 14 '09 at 17:46
  • "Want to impose memory limits? Put a check in to malloc.": So you call sbr() [Unix] or VirtualAlloc() [Windows] yourself. And/or another library function which is already loaded and which needs to allocate memory. – Rutger Nijlunsing Jun 14 '09 at 17:48
  • "Operating systems create these kinds of constrained environments using their Virtual Memory managers all day long, so you can readily do these things on modern OS's.": But OSes keep processes from biting each other and do not guarantee anything within a process. It becomes a kind of http://www.corewars.org/ . – Rutger Nijlunsing Jun 14 '09 at 17:50
3

8 years later and I've discovered a new platform that meets all of my original requirements. Web Assembly allows you to run a C/C++ subset safely inside a browser and comes with similar safety restrictions to my requirements such as restricting memory access and preventing unsafe operations on the OS and parent process. It's been implemented in Firefox 52 and there are promising signs other browsers will support it in the future.

SpliFF
  • 38,186
  • 16
  • 91
  • 120
3

Perfectly impossible. The language just doesn't work this way. The concept of classes is lost very early in most compilers, including GCC. Even if it was, there would be no way to associate each memory allocation with a live object, let alone a "module".

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Would it not be possible to somehow rewrite allocations? something like "int a;" -> "int a = system.newint(_MODULE_);". Remember I have a custom preprocessor between the code and the compiler and I don't mind losing efficiency for the sake of safety. – SpliFF Jun 11 '09 at 10:01
  • 1
    Not really. You can't just rewrite tokens in the assumption you know what they mean from the direct context. Especially with templates, you will have to run a full C++ compiler. You have to do two-phase name lookup, argument-dependent lookup and all that. Now, the backend of this compiler could in theory output C++ suitable for G++, but what's the point? You've got a compiled representation already. Why not target the GCC backend instead? But you'd be writing a compiler yourself. – MSalters Jun 11 '09 at 11:20
3

I haven't investigated this very closely, but the guys working on Chromium (aka Google Chrome) are working on a sandbox almost like this already, which might be worth looking into.

http://dev.chromium.org/developers/design-documents/sandbox/Sandbox-FAQ

It's open source, so should be possible to use it.

Epcylon
  • 4,674
  • 2
  • 26
  • 35
  • Going by the description it looks like it uses seperate processes "The only resources sandboxed processes can freely use are CPU cycles and memory.". I will still have a closer look into it and see if there's anything useful but I doubt it will meet my requirements. – SpliFF Jun 14 '09 at 19:08
2

Liran pointed out codepad.org in a comment above. It isn't suitable because it relies on a very heavy environment (consisting of ptrace, chroot, and an outbound firewall) however I found there a few g++ safety switches which I thought I'd share here:

gcc 4.1.2 flags: -O -fmessage-length=0 -fno-merge-constants -fstrict-aliasing -fstack-protector-all

g++ 4.1.2 flags: -O -std=c++98 -pedantic-errors -Wfatal-errors -Werror -Wall -Wextra -Wno-missing-field-initializers -Wwrite-strings -Wno-deprecated -Wno-unused -Wno-non-virtual-dtor -Wno-variadic-macros -fmessage-length=0 -ftemplate-depth-128 -fno-merge-constants -fno-nonansi-builtins -fno-gnu-keywords -fno-elide-constructors -fstrict-aliasing -fstack-protector-all -Winvalid-pch

The options are explained in the GCC manual

What really caught my eye was the stack-protector flag. I believe it is a merge of this IBM research project (Stack-Smashing Protector) with the official GCC.

The protection is realized by buffer overflow detection and the variable reordering feature to avoid the corruption of pointers. The basic idea of buffer overflow detection comes from StackGuard system.

The novel features are (1) the reordering of local variables to place buffers after pointers to avoid the corruption of pointers that could be used to further corrupt arbitrary memory locations, (2) the copying of pointers in function arguments to an area preceding local variable buffers to prevent the corruption of pointers that could be used to further corrupt arbitrary memory locations, and the (3) omission of instrumentation code from some functions to decrease the performance overhead.

Community
  • 1
  • 1
SpliFF
  • 38,186
  • 16
  • 91
  • 120
2

It is impossible to make a static code verifier that can determine that for all possible codes, that a set of code is safe or unsafe, if the language is Turing complete. It is equivalent to the halting problem.

Of course this point is moot if you have supervisor code running at a lower ring level or being an interpreted language (ie. emulating machine resources).

The best way to do this is to start the code in another process (ipc is not that bad), and trap system calls like Ptrace in linuxes http://linux.die.net/man/2/ptrace

Unknown
  • 45,913
  • 27
  • 138
  • 182
  • There are several problems with this approach. IPC may not be "that bad" but I didn't really make it clear how small some of these modules could be. Many will be little more than a 10 line function but they may be called millions of times by other modules. IPC/Fork is simply too heavy for what I have in mind. – SpliFF Jun 12 '09 at 06:48
  • That's simply not true. I'll give you a trivially simple example: a basic Turing machine simulator. No pointers, not system commands, no output, and you can easily check that a purported description of a Turing machine truly is one. What you *can't* do is predict if it halts but it's trivial to write a Turing machine simulator that no simulated machine can break out of. I say stalling from writing up my computability paper I was supposed to get to my collaborator a month ago. – Peter Gerdes Sep 07 '20 at 23:14
0

Nice idea, but I'm fairly sure what you're trying to do is impossible with C or C++. If you dropped the sandbox idea it might work.

Java's already got a similar (as in a large library of 3rd party code) system in Maven2

Glen
  • 21,816
  • 3
  • 61
  • 76
  • 1
    I'm interested in the specifics of why it can't be done. I've done quite a few things that couldn't be done before and I'm prepared to try anyway. I just don't know what the traps are I'm looking for. – SpliFF Jun 11 '09 at 09:43
  • ok, how do you stop malicious code deliberately coding in a buffer overflow scenario? How do you restrict access to the STL, given that most third party code will use it, e.g. std::string. If you can use the STL, you can use streams. If you can use streams you can edit files on the HDD. – Glen Jun 11 '09 at 09:49
  • it isn't a requirement for modules to load any libraries/headers they want. The preprocessor could simply undef all unsupported includes for example. I'd prefer the modules do not load STL or even things like stdio directly. It's my intention that modules focus more on mathematical transformations than external IO or A/V. – SpliFF Jun 11 '09 at 10:04
  • The problem is that, in general, C and C++ compile to the bare silicon. This means that they have the full power of the computer available. Java is more amenable to this since it compiles to a VM and a VM theoretically can have all sorts of restrictions (not that it works perfectly in practice). If you can find a C interpreter, you might be able to restrict it enough. (You can't compile arbitrary C code to a JVM, last I looked, but that was a long time ago. It doesn't respect data types enough.) – David Thornley Jun 11 '09 at 14:22
0

If you want to be really sure, I think the best and perhaps only way to do this is do go down the line of seperate processes and let the O/S handle the access control. It's not that painful to write a generic threaded loader and once you have it, you can override some functions to load specific libraries.

Jon Cage
  • 36,366
  • 38
  • 137
  • 215
  • seperate processes would be ok for large modules but what I'm planning needs to support very small modules (perhaps no more than 10 lines of C) and modules may be called millions of times per second. No IPC techniques are lightweight enough for what I'm trying to do. – SpliFF Jun 12 '09 at 06:55
0

Youy appear to be trying to solve two non-problems. In my own code I have no memory alocation problems or issues with recursion or infinite loops.

What you seem to be proposingh is a different, more limited language than C++. This is something you can pursue of course, but as others have noted you will have to write a compiler for it - simple textual processing will not give you what you want.

  • I'm not trying to protect my application from my own code, i'm trying to protect it from included (possibly malicious) source code coming from untrusted sources. If I download and compile the external module I want the framework to stop the module bringing down the whole system whether accidentally or by design. The proposal is about how to detect and prevent this without going for a heavier implementation like a true VM or interpreter (or writing a C compiler from scratch). – SpliFF Jun 11 '09 at 12:52
  • Well, good luck - but I think what you are suggesting is not possible, and certainly not useful. –  Jun 11 '09 at 12:55