0

I have multiple scripts that are exporting a same interface and they're executed using execfile() in insulated scope.

The thing is, I want them to share some resources so that each new script doesn't have to load them again from the start, thus loosing starting speed and using unnecessary amount of RAM.

The scripts are in reality much better encapsulated and guarded from malicious plug-ins than presented in example below, that's where problems for me begins.

The thing is, I want the script that creates a resource to be able to fill it with data, remove data or remove a resource, and of course access it's data.

But other scripts shouldn't be able to change another's scripts resource, just read it. I want to be sure that newly installed plug-ins cannot interfere with already loaded and running ones via abuse of shared resources.

Example:

class SharedResources:
    # Here should be a shared resource manager that I tried to write
    # but got stuck. That's why I ask this long and convoluted question!
    # Some beginning:
    def __init__ (self, owner):
        self.owner = owner

    def __call__ (self):
        # Here we should return some object that will do
        # required stuff. Read more for details.
        pass

class plugin (dict):
    def __init__ (self, filename):
        dict.__init__(self)
        # Here some checks and filling with secure versions of __builtins__ etc.
        # ...
        self["__name__"] = "__main__"
        self["__file__"] = filename
        # Add a shared resources manager to this plugin
        self["SharedResources"] = SharedResources(filename)
        # And then:
        execfile(filename, self, self)

    # Expose the plug-in interface to outside world:
    def __getattr__ (self, a):
        return self[a]
    def __setattr__ (self, a, v):
        self[a] = v
    def __delattr__ (self, a):
        del self[a]
    # Note: I didn't use self.__dict__ because this makes encapsulation easier.
    # In future I won't use object itself at all but separate dict to do it. For now let it be

----------------------------------------
# An example of two scripts that would use shared resource and be run with plugins["name"] = plugin("<filename>"):
# Presented code is same in both scripts, what comes after will be different.

def loadSomeResource ():
    # Do it here...
    return loadedresource

# Then Load this resource if it's not already loaded in shared resources, if it isn't then add loaded resource to shared resources:
shr = SharedResources() # This would be an instance allowing access to shared resources
if not shr.has_key("Default Resources"):
    shr.create("Default Resources")
if not shr["Default Resources"].has_key("SomeResource"):
    shr["Default Resources"].add("SomeResource", loadSomeResource())
resource = shr["Default Resources"]["SomeResource"]
# And then we use normally resource variable that can be any object.
# Here I Used category "Default Resources" to add and/or retrieve a resource named "SomeResource".
# I want more categories so that plugins that deal with audio aren't mixed with plug-ins that deal with video for instance. But this is not strictly needed.
# Here comes code specific for each plug-in that will use shared resource named "SomeResource" from category "Default Resources".
...
# And end of plugin script!
----------------------------------------

# And then, in main program we load plug-ins:
import os
plugins = {} # Here we store all loaded plugins
for x in os.listdir("plugins"):
    plugins[x] = plugin(x)

Let say that our two scripts are stored in plugins directory and are both using some WAVE files loaded into memory. Plugin that loads first will load the WAVE and put it into RAM. The other plugin will be able to access already loaded WAVE but not to replace or delete it, thus messing with other plugin.

Now, I want each resource to have an owner, some id or filename of the plugin script, and that this resource is writable only by it's owner.

No tweaking or workarounds should enable the other plugin to access the first one.

I almost did it and then got stuck, and my head is spining with concepts that when implemented do the thing, but only partially. This eats me, so I cannot concentrate any more. Any suggestion is more than welcome!

Adding:

This is what I use now without any safety included:

# Dict that will hold a category of resources (should implement some security):
class ResourceCategory (dict):
    def __getattr__ (self, i): return self[i]
    def __setattr__ (self, i, v): self[i] = v
    def __delattr__ (self, i): del self[i]

SharedResources = {} # Resource pool

class ResourceManager:
    def __init__ (self, owner):
        self.owner = owner

    def add (self, category, name, value):
        if not SharedResources.has_key(category):
            SharedResources[category] = ResourceCategory()
        SharedResources[category][name] = value

    def get (self, category, name):
        return SharedResources[category][name]

    def rem (self, category, name=None):
        if name==None: del SharedResources[category]
        else: del SharedResources[category][name]

    def __call__ (self, category):
        if not SharedResources.has_key(category):
            SharedResources[category] = ResourceCategory()
        return SharedResources[category]

    __getattr__ = __getitem__ = __call__

    # When securing, this must not be left as this, it is unsecure, can provide a way back to SharedResources pool:
    has_category = has_key = SharedResources.has_key

Now a plugin capsule:

class plugin(dict):
    def __init__ (self, path, owner):
        dict.__init__()
        self["__name__"] = "__main__"
        # etc. etc.
        # And when adding resource manager to the plugin, register it with this plugin as an owner
        self["SharedResources"] = ResourceManager(owner)
        # ...
        execfile(path, self, self)
        # ...

Example of a plugin script:

#-----------------------------------
# Get a category we want. (Using __call__() ) Note: If a category doesn't exist, it is created automatically.
AudioResource = SharedResources("Audio")
# Use an MP3 resource (let say a bytestring):
if not AudioResource.has_key("Beep"):
    f = open("./sounds/beep.mp3", "rb")
    Audio.Beep = f.read()
    f.close()
# Take a reference out for fast access and nicer look:
beep = Audio.Beep # BTW, immutables doesn't propagate as references by themselves, doesn't they? A copy will be returned, so the RAM space usage will increase instead. Immutables shall be wrapped in a composed data type.

This works perfectly but, as I said, messing resources is too much easy here.

I would like an instance of ResourceManager() to be in charge to whom return what version of stored data.

Dalen
  • 4,128
  • 1
  • 17
  • 35
  • Do you trust the writers of plugins not to be malicious? If you can't trust the authors, it's been shown that making it safe to eval / exec / execfile is mostly impossible. See here: http://programmers.stackexchange.com/a/191628 and/or google for "python exec untrusted". If you trust the plugin authors not to try to circumvent your sandboxing, then you might come up with a system that will prevent people from accidentally breaking shared resources and/or doing arbitrary things. – Matt Anderson Sep 05 '15 at 04:12
  • I don't trust them, but I can ignore what are people doing to their own computer. As for sharing, I'll check all new plugins before giving them up to be installable from server by a plugin manager. And, no, it is not inpossible to make completely secure sandbox. There existed rexec module that used to be secure as much as possible, but it is not developed any more and now is deemed unsecure. But concept is OK and it can be improved to be completely secure. – Dalen Sep 05 '15 at 11:35
  • After forbidding a plugin writer access to modules that can influence users data and such, and giving him/her restricted versions of them, you can monitor the execution for any memory or CPU abuse, and you can check the source code to see whether there is any backward call to access the main scope that you can't otherwise control through restrictions only. You simply can forbid class creations for instance. But no, this much is not necessary for now. It depends on how popular the app wil become. But as it is needed for certain group of people, I am sure that it will be used. – Dalen Sep 05 '15 at 11:51
  • As I said, I'll make plugin manager that will install only checked plugins. I simply want to ensure that no one is going to try to change others resources. You know, raise an error that it is not nice to do so. So that writers don't try it any more. – Dalen Sep 05 '15 at 11:53
  • What are "resources" in this scenario? Can they be fully described by immutable data structures (or things objects that assume that behavior)? I.e., `(str, int, float, tuple, frozenset)`, `collections.Mapping` (immutable equivalent of `dict`), and maybe file-like objects in read-only mode? Hand immutable stand-in objects to the "non-owner" participants? – Matt Anderson Sep 05 '15 at 19:49
  • Genial! Returning immutable contra-parts solves the problem of wrapping them in some instance that would control their access. I admit, this didn't occur to me at all. Resources will be mainly dictionaries, I expect, but I meant to let writers to add anything to the resources. Non-owner readers are the problem. – Dalen Sep 05 '15 at 23:49
  • But immutable objects solve a problem of changing them. There are still two main problems to be solved. How to know who is owner of what (without making custom data types (via inheritance) that would carry info about the ownership) and how to control assignment i.e. I don't want someone completely replacing someone-elses resource. – Dalen Sep 05 '15 at 23:57

1 Answers1

1

So, my general approach would be this.

  1. Have a central shared resource pool. Access through this pool would be read-only for everybody. Wrap all data in the shared pool so that no one "playing by the rules" can edit anything in it.

  2. Each agent (plugin) maintains knowledge of what it "owns" at the time it loads it. It keeps a read/write reference for itself, and registers a reference to the resource to the centralized read-only pool.

  3. When an plugin is loaded, it gets a reference to the central, read-only pool that it can register new resources with.

So, only addressing the issue of python native data structures (and not instances of custom classes), a fairly locked down system of read-only implementations is as follows. Note that the tricks that are used to lock them down are the same tricks that someone could use to get around the locks, so the sandboxing is very weak if someone with a little python knowledge is actively trying to break it.

import collections as _col
import sys

if sys.version_info >= (3, 0):
    immutable_scalar_types = (bytes, complex, float, int, str)
else:
    immutable_scalar_types = (basestring, complex, float, int, long)

# calling this will circumvent any control an object has on its own attribute lookup
getattribute = object.__getattribute__

# types that will be safe to return without wrapping them in a proxy
immutable_safe = immutable_scalar_types

def add_immutable_safe(cls):
    # decorator for adding a new class to the immutable_safe collection
    # Note: only ImmutableProxyContainer uses it in this initial
    # implementation
    global immutable_safe
    immutable_safe += (cls,)
    return cls

def get_proxied(proxy):
    # circumvent normal object attribute lookup
    return getattribute(proxy, "_proxied")

def set_proxied(proxy, proxied):
    # circumvent normal object attribute setting
    object.__setattr__(proxy, "_proxied", proxied)

def immutable_proxy_for(value):
    # Proxy for known container types, reject all others
    if isinstance(value, _col.Sequence):
        return ImmutableProxySequence(value)
    elif isinstance(value, _col.Mapping):
        return ImmutableProxyMapping(value)
    elif isinstance(value, _col.Set):
        return ImmutableProxySet(value)
    else:
        raise NotImplementedError(
            "Return type {} from an ImmutableProxyContainer not supported".format(
                type(value)))

@add_immutable_safe
class ImmutableProxyContainer(object):

    # the only names that are allowed to be looked up on an instance through
    # normal attribute lookup
    _allowed_getattr_fields = ()

    def __init__(self, proxied):
        set_proxied(self, proxied)

    def __setattr__(self, name, value):
        # never allow attribute setting through normal mechanism
        raise AttributeError(
            "Cannot set attributes on an ImmutableProxyContainer")

    def __getattribute__(self, name):
        # enforce attribute lookup policy
        allowed_fields = getattribute(self, "_allowed_getattr_fields")
        if name in allowed_fields:
            return getattribute(self, name)
        raise AttributeError(
            "Cannot get attribute {} on an ImmutableProxyContainer".format(name))

    def __repr__(self):
        proxied = get_proxied(self)
        return "{}({})".format(type(self).__name__, repr(proxied))

    def __len__(self):
        # works for all currently supported subclasses
        return len(get_proxied(self))

    def __hash__(self):
        # will error out if proxied object is unhashable
        proxied = getattribute(self, "_proxied")
        return hash(proxied)

    def __eq__(self, other):
        proxied = get_proxied(self)
        if isinstance(other, ImmutableProxyContainer):
            other = get_proxied(other)
        return proxied == other


class ImmutableProxySequence(ImmutableProxyContainer, _col.Sequence):

    _allowed_getattr_fields = ("count", "index")

    def __getitem__(self, index):
        proxied = get_proxied(self)
        value = proxied[index]
        if isinstance(value, immutable_safe):
            return value
        return immutable_proxy_for(value)


class ImmutableProxyMapping(ImmutableProxyContainer, _col.Mapping):

    _allowed_getattr_fields = ("get", "keys", "values", "items")

    def __getitem__(self, key):
        proxied = get_proxied(self)
        value = proxied[key]
        if isinstance(value, immutable_safe):
            return value
        return immutable_proxy_for(value)

    def __iter__(self):
        proxied = get_proxied(self)
        for key in proxied:
            if not isinstance(key, immutable_scalar_types):
                # If mutable keys are used, returning them could be dangerous.
                # If owner never puts a mutable key in, then integrity should
                # be okay. tuples and frozensets should be okay as keys, but
                # are not supported in this implementation for simplicity.
                raise NotImplementedError(
                    "keys of type {} not supported in "
                    "ImmutableProxyMapping".format(type(key)))
            yield key


class ImmutableProxySet(ImmutableProxyContainer, _col.Set):

    _allowed_getattr_fields = ("isdisjoint", "_from_iterable")

    def __contains__(self, value):
        return value in get_proxied(self)

    def __iter__(self):
        proxied = get_proxied(self)
        for value in proxied:
            if isinstance(value, immutable_safe):
                yield value
            yield immutable_proxy_for(value)

    @classmethod
    def _from_iterable(cls, it):
        return set(it)

NOTE: this is only tested on Python 3.4, but I tried to write it to be compatible with both Python 2 and 3.

Make the root of the shared resources a dictionary. Give a ImmutableProxyMapping of that dictionary to the plugins.

private_shared_root = {}
public_shared_root = ImmutableProxyMapping(private_shared_root)

Create an API where the plugins can register new resources to the public_shared_root, probably on a first-come-first-served basis (if it's already there, you can't register it). Pre-populate private_shared_root with any containers you know you're going to need, or any data you want to share with all plugins but you know you want to be read-only.

It might be convenient if the convention for the keys in the shared root mapping were all strings, like file-system paths (/home/dalen/local/python) or dotted paths like python library objects (os.path.expanduser). That way collision detection is immediate and trivial/obvious if plugins try to add the same resource to the pool.

Matt Anderson
  • 19,311
  • 11
  • 41
  • 57
  • Excellent! Now, I added my current solution (no security) to the Q. Perhaps you can advise the best way to integrate your solution into mine. Let leave data types to be (int, float, bool, long, str, unicode, tuple, list, dict, set and read only file), For simplicity. As resources in my case is data only, no access drivers or similar stuff. See If you can eliminate two pools of resources it'would be good. Else it doesn't really matter. – Dalen Sep 06 '15 at 23:16
  • And, just out of curiosity, how would you go about breaking into object that already has __getattribute__() and __setattr__() defined? Hm, without using it's class to reinitialize it I don't see any other way at the moment. – Dalen Sep 06 '15 at 23:20
  • Oh, yeah, I totally forgot, I use Python 2.5 to 2.7, but don't bother yourself, just few modifications will be enough to use 3.4 in 2.x. It's not a problem. – Dalen Sep 06 '15 at 23:24
  • @Dalen As for your question about `__getattribute__()`, if I say `value = foo.my_attribute` and the class of `foo` has `__getattribute__()` defined, it will be called to deal with the resolution of the "dot operator" (the attribute lookup). If I instead say `value = object.__getattribute__(foo, "my_attribute")`, the `__getattribute__()` of `foo` **will not be** called; the one on `object` is called instead. To really understand the difference and the why, it helps to really know how python attribute lookup works "under the hood" (and it's documented, so it's not "abuse", per se, to use it). – Matt Anderson Sep 07 '15 at 03:26
  • I included that in my "using class" as well, but this still doesn't change the instance, does it? I mean instance.attribute = "blah"; cannot be set if __setattr__() doesn't permit it, while class that created this instance change would permit clsforinstance.attribute = "blah", because __setattr__() wouldn't be called, But this attribute would be in this class, not in a already living object. And if you don't have access to the class, just to the object, you cannot modify it. – Dalen Sep 07 '15 at 10:55
  • And, if class adds it's __getattribute__() from the __init__() or __new__() methods, it wouldn't be there to use it instead of the one in the instance, so no point in calling it at all. So, any "foo" object that would replace the original instance, wouldn't be able to receive that call. Remember, we are executing this in a script that may forbid accessing Python's "magic methods" from outside of anything anyway. But still, changing the living instance that has __setattr__() is still pretty non-trivial work. Without reinit or total replacement of that instance. – Dalen Sep 07 '15 at 11:08
  • If by object, you meant the object type (not exemplary expression), then say you do not have access to it, nor the automatic generation of new-style classes. What then? – Dalen Sep 07 '15 at 11:20
  • I do mean the `object` builtin / type. And I'm not sure how to lock down the system such that you can't get to it, without severely limiting functionality. `type(public_shared_root).mro()[-1] == object`, `public_shared_root.keys.__class__.mro()[-1] == object`. Using the unbound method `object.__getattribute__` and giving it an instance object of some subclass type (everything) as first argument, this will let you circumvent access restrictions built into the subclass. Python is built to enable the programmer, not to put restrictions on him/her. It is really hard to lock down. – Matt Anderson Sep 07 '15 at 15:47
  • Yes, I agree, and like Python even more for that. It ensures that there are challenges for me, and on the other hand, that developing of simple stupid stuff is fast and easy. – Dalen Sep 07 '15 at 23:09
  • As for locking: In Python 2, it is enough that you don't pass object builtin into the plugins scope dictionary, and also, that you do not allow importing of any module that might provide access to it. Python 2 doesn't automatically inherit the object class. (so to call new-style classes creation). Well, in Python 3, it would be a problem. Because a simple: class blah: pass; and: object = blah(); would give us back what we need. Am I correct? I still didn't switch to Python 3, and I am not planing to do it soon. This particular project is in Python 2.7. – Dalen Sep 07 '15 at 23:17
  • No. It is not enough not to pass `object` into the evaluation context. Consider `def foo(): pass` and then `print foo.__class__.__mro__[-1] == object`. This is `True` (verified on Python 2.7). So if the user can define a function, the user can recover a reference to the `object` builtin. Not passing it in means the user must be more knowledgable about python, but does not deny him or her access. – Matt Anderson Sep 07 '15 at 23:24
  • Honestly, I can't imagine a way to *actually* lock down an "untrusted" python program short of parsing the client program with `ast.parse` and then walking the AST and then only allowing white-listed operations. Do not compile and execute the plugin unless it passes the "whitelisting test". Even then, someone clever might find a loophole in the test and get arbitrary code to run. Real security is very hard. – Matt Anderson Sep 07 '15 at 23:40
  • Yes, well, I forgot that __class__ is accessible through function objects too. But I mean to riffle through the uncompiled code and not permit any "._" attribute call. Nor function co_code (or how it is called, I can't remember now) etc. I even consider disabling OP completely i.e. plugins will be treated as an instance anyway, so let say that the file is a class and further classes inside will raise an error. There is one way of making it as secure as possible. To run it in specially prepaired interpreter. This would be a tremendous job, but if I need 99% security I'd do that. – Dalen Sep 08 '15 at 01:05
  • P.S. You mentioned ast module, so I went digging to see whether it is easier doing it by yourself or using ast. I discovered that ast module is available in Python 2.6 and up. Before there is only compiler module. So I wandered whether you know what is better ast.parse() or compiler.parse(). As far as I can see (at first glance) seems that they both provide similar AST and that module ast is a rewrite of certain compiler module functionalities. If we want even lower, module parser is also here. – Dalen Sep 08 '15 at 02:25
  • The `compiler` module is deprecated in 2.6+, and removed in 3.x. I've never used it. I have used the `ast` module to a limited extent (re-writing existing code in certain circumstances so only part of it would execute). Also, the `ast` module was initially written by Armin Ronacher, who I respect for having done a lot of top-notch python library development. – Matt Anderson Sep 08 '15 at 02:31