Type detection and collision avoidance at constructor time

Question

Thanks everyone for your help so far. I've narrowed it down a bit. If you look at HERE in both the script and the class, and run the script, you'll see what is going on.

The ADD line print "789 789"

when it should be printing "456 789"

What appears to be happening, is in new the class is detecting the type of the incoming argument. However if the incoming object, has the same type as the constructor it appears to be paging the incoming object, into itself (at the class level) instead of returning the old object. That is the only thing I can think of that would cause 456 to get creamed.

So how do you detect something that is the same type of a class, within a constructor and decide NOT to page that data into the class memory space, but instead return the previously constructed object?

import sys
import math

class Foo(): 

    # class level property

    num = int(0) 

    # 
    # Python Instantiation Customs: 
    # 
    # Processing polymorphic input new() MUST return something or 
    # an object?,  but init() cannot return anything. During runtime 
    # __new__ is running at the class level, while init is running 
    # at the instance level. 
    # 

    def __new__(self,*arg): 

        print ("arg type: ", type(arg[0]).__name__)

###  functionally the same as isinstance() below
#
#       if (type(arg[0]).__name__) == "type": 
#           if arg[0].__name__ == "Foo":
#               print ("\tinput was a Foo")
#               return arg[0] # objects of same type intercede

### HERE <------------------------------------- 
# 
# this creams ALL instances, because since we are a class 
# the properties of the incoming object, seem to overwride 
# the class, rather than exist as a separate data structure. 

        if (isinstance(arg[0], Foo)): 
            print ("\tinput was a Foo")
            return arg[0] # objects of same type intercede

        elif (type(arg[0]).__name__) == "int": 
            print ("\tinput was an int")
            self.inum = int(arg[0]) # integers store
            return self

        elif (type(arg[0]).__name__) == "str": 
            print ("\tinput was a str")
            self.inum = int(arg[0]) # strings become integers
            return self

        return self 

    def __init__(self,*arg):
        pass

    # 
    # because if I can do collision avoidance, I can instantiate 
    # inside overloaded operators: 
    # 

    def __add__(self,*arg): 

        print ("add operator overload")

        # no argument returns self

        if not arg: 
            return self

        # add to None or zero return self

        if not arg[0]: 
            return self

        knowntype = Foo.Foo(arg[0])

        # add to unknown type returns False

        if not knowntype: 
            return knowntype

        # both values are calculable, calculate and return a Foo

        typedresult = (self.inum + knowntype.inum) 

        return Foo.Foo(typedresult) 

    def __str__(self): # return a stringified int or empty string

        # since integers don't have character length, 
        # this tests the value, not the existence of:  

        if self.inum: 
            return str(self.inum)

        # so the property could still be zero and we have to 
        # test again for no reason. 

        elif self.inum == 0:
            return str(self.inum)   

        # return an empty str if nothing is defined. 

        return str("")

testfoo.py:

#! /usr/bin/python

import sys
import Foo 

# A python class is not transparent like in perl, it is an object 
# with unconditional inheritance forced on all instances that share 
# the same name. 

classhandle = Foo.Foo 

# The distinction between the special class object, and instance 
# objects is implicitly defined by whether there is a passed value at 
# constructor time. The following therefore does not work. 

# classhandle = Foo.Foo() 

# but we can still write and print from the class, and see it propagate, 
# without having any "object" memory allocated.  

print ("\nclasshandle: ", classhandle)
print ("classhandle classname: ", classhandle.__name__) # print the classname
print ("class level num: ", classhandle.num)     # print the default num
classhandle.classstring = "fdsa" # define an involuntary value for all instances

print ("\n")

# so now we can create some instances with passed properties. 

instance1 = Foo.Foo(int(123)) # 

print ("\ninstance1: ", instance1)
print ("involuntary property derived from special class memory space: ", instance1.classstring)
print ("instance property from int: ", instance1.inum)

print ("\n")

instance2 = Foo.Foo(str("456"))
print ("\ninstance2: ", instance2)
print ("instance2 property from int: ", instance2.inum)

# 
# instance3 stands for (shall we assume) some math that happened a 
# thousand lines ago in a class far far away. We REALLY don't 
# want to go chasing around to figure out what type it could possibly 
# be, because it could be polymorphic itself. Providing a black box so 
# that you don't have to do that, is after all, the whole point OOP. 
# 

print ("\npretend instance3 is unknowningly already a Foo")
instance3 = Foo.Foo(str("789"))

## So our class should be able to handle str,int,Foo types at constructor time. 

print ("\ninstance4 should be a handle to the same memory location as instance3")

instance4 = Foo.Foo(instance3) # SHOULD return instance3 on type collision

# because if it does, we should be able to hand all kinds of garbage to 
# overloaded operators, and they should remain type safe.  

# HERE <-----------------------------
# 
# the creation of instance4, changes the instance properties of instance2: 
# below, the instance properties inum, are now both "789". 

print ("ADDING: ", instance2.inum, " ", instance4.inum)

# instance6 = instance2 + instance4 # also should be a Foo object
# instance5 = instance4 + int(549) # instance5 should be a Foo object.

This may be a pedantic point, but I think it's important, "Those input types may be primitives or objects." There are no primitive types in Python. Everything is an object. — juanpa.arrivillaga, Jul 30 '18 at 04:09
"everything is an object": To me that would appear to be incorrect. WIthin __new__ (constructor phase?) type() does not find the namespace of the passed structure, but it does in __init__. In perl terms this would mean that the data structure is unblessed in __new__, and not yet an object. This is a compelled custom. There is no reason for it, other than to make a bunch of people look at the code from a common perspective, as far as I can tell. — James Aanderson, Jul 30 '18 at 16:45
I don't understand what you are saying here, and how it relates to *everything being an object* in Python. Again, there *are no primitive types*. When you say "type() does not find the namespace of the passed structure" it is unclear what you mean. What passed structure? What namespace? Namespaces are actually just objects in Python, typically (although not always) `dict` objects. You seem to be trying to apply concepts from Pearl which do not apply to Python, there are no "blessed and unblessed" objects — juanpa.arrivillaga, Jul 30 '18 at 17:44
In any event, there is a very good answer here describing your mistake vis a vis `__new__` vs `__init__`. Moreover, this construction `(type(arg[0]).__name__) == "int"` is **not** how you should do type-checking in Python. If you want a specific type, you use `type(obj) is int`, if you want to handle subclass relationships, use `isinstance(obj, int)`. Is the `__name__` attribute what you were referring to as the namespace? — juanpa.arrivillaga, Jul 30 '18 at 17:47
using type(obj) in new() returns the string "type" not the type of the object. In perl the bless() function binds the namespace to the memory address of the data structure. That is how it becomes an object. In Python, this is implicit. But to instantiate a new object, only on the absence of a previous object, you have to know where the namespace binds to the data structure. The constructor has to check the type of the incomming argument, before instantiating. — James Aanderson, Jul 30 '18 at 17:55
Yes, `type(firstargument)` in `__new__` will return `type`, since the first argument is a *class*, which is just an object of type `type`. The *instance doesn't exist at this point*, indeed, `__new__` is tasked with creating the instance. Have you looked at abernart's answer? I am still unsure what you mean by namespace here. — juanpa.arrivillaga, Jul 30 '18 at 18:10
It seems like you just need to call `type` or `isinstance` on `args[0]` instead of `cls` in your `__new__(cls, *args)`. Is that the only problem you're having here? In case it isn't clear: the `cls` parameter will get the class that you're trying to construct an instance of (which will be `Foo`, or a subclass of `Foo`); the other parameters will get the arguments passed to the constructor. — abarnert, Jul 30 '18 at 18:20
Thanks everyone for your help. I understand the problem a little better now, but I still don't have a solution. — James Aanderson, Jul 30 '18 at 19:00
Again, you are treating `self` as the *instance*. In the context of `super`, `self` will be the *class*. So you have to **actually construct the instance**. See abarnart's answer. For instance, in your `(type(arg[0]).__name__) == "str"` branch (again with that very hacky construction that you *shouldn't* be using) you assign `self.inum = int(arg[0])` which assigns *to the class object* creating a *class variable* (i.e. static variable) and then you `return self`, but `self` is the *class object*, so now you have a constructor that returns the class, not an instance... — juanpa.arrivillaga, Jul 30 '18 at 19:28
So in __new__ I am operating at the class level, and at __init__ I am operating at the instance level. So, at the instance level, how do I transpose a previously existing object, into the place of the current one, without depending on the user to do it? I can't just return the other instance from __init__. I can't transpose them within __new__ because it collides with the class memory space. I get the problem. What I don't get is how to avoid 50 layers of abstraction, if you can't write self aware objects. Registries create problems of their own. I'd rather not use them. — James Aanderson, Jul 30 '18 at 20:12
@JamesAanderson you can do it in `__super__`, just like abarnart's answer. `instance = super().__new__(cls); instance.inum = int(arg[0]); return instance`. remember, `__new__` should be returning an instance, not `self`, which is actually the class, which is why by convention it will be named `cls` when implementing `__new__` just to remind you. — juanpa.arrivillaga, Jul 30 '18 at 20:38

abarnert · Answer 1 · 2018-07-30T18:21:48.933

How do I, at constructor time, return a non-new object?

By overriding the constructor method, __new__, not the initializer method, __init__.

The __new__ method constructs an instance—normally by calling the super's __new__, which eventually gets up to object.__new__, which does the actual allocation and other under-the-covers stuff, but you can override that to return a pre-existing value.

The __init__ method is handed a value that's already been constructed by __new__, so it's too late for it to not construct that value.

Notice that if Foo.__new__ returns a Foo instance (whether a newly-created one or an existing one), Foo.__init__ will be called on it. So, classes that override __new__ to return references to existing objects generally need an idempotent __init__—typically, you just don't override __init__ at all, and do all of your initialization inside __new__.

There are lots of examples of trivial __new__ methods out there, but let's show one that actually does a simplified version of what you're asking for:

class Spam:
    _instances = {}
    def __new__(cls, value):
        if value not in cls._instances:
            cls._instances[value] = super().__new__(cls)
            cls._instances[value].value = value
        return cls._instances[value]

Now:

>>> s1 = Spam(1)
>>> s2 = Spam(2)
>>> s3 = Spam(1)
>>> s1 is s2
False
>>> s1 is s3
True

Notice that I made sure to use super rather than object, and cls._instances¹ rather than Spam._instances. So:

>>> class Eggs(Spam):
...     pass
>>> e4 = Eggs(4)
>>> Spam(4)
<__main__.Eggs at 0x12650d208>
>>> Spam(4) is e4
True
>>> class Cheese(Spam):
...     _instances = {}
>>> c5 = Cheese(5)
>>> Spam(5)
<__main__.Spam at 0x126c28748>
>>> Spam(5) is c5
False

However, it may be a better option to use a classmethod alternate constructor, or even a separate factory function, rather than hiding this inside the __new__ method.

For some types—like, say, a simple immutable container like tuple—the user has no reason to care whether tuple(…) returns a new tuple or an existing one, so it makes sense to override the constructor. But for some other types, especially mutable ones, it can lead to confusion.

The best test is to ask yourself whether this (or similar) would be confusing to your users:

>>> f1 = Foo(x)
>>> f2 = Foo(x)
>>> f1.spam = 1
>>> f2.spam = 2
>>> f1.spam
2

If that can't happen (e.g., because Foo is immutable), override __new__.
If that exactly what users would expect (e.g., because Foo is a proxy to some object that has the actual spam, and two proxies to the same object had better see the same spam), probably override __new__.
If it would be confusing, probably don't override __new__.

For example, with a classmethod:

>>> f1 = Foo.from_x(x)
>>> f2 = Foo.from_x(x)

… it's a lot less likely to be surprising if f1 is f2 turns out to be true.

_{1. Even though you define __new__ like an instance method, and its body looks like a class method, it's actually a static method, that gets passed the class you're trying to construct (which will be Spam or a subclass of Spam) as an ordinary first parameter, with the constructor arguments (and keyword arguments) passed after that.}

@martineau Done. I think I came up with the simplest example that's relevant to the OP's intended use. — abarnert, Jul 30 '18 at 04:20
@martineau Do you think it also needs a sample `from_x` alternate constructor implementation, or is the answer already too long and that part is obvious enough? — abarnert, Jul 30 '18 at 04:26
In the first code example, this looks like your creating an object registry. I don't need to do that. I'm not looking to validate against a specific previous object, only against the namespace of any object. I'm trying to normalize different input types, so that my exported interface is type agnostic. To do that, I have to know what my input types are, but I only have to know what they are, not which they are. — James Aanderson, Jul 30 '18 at 18:08
@JamesAanderson you can still do that using this approach, just check `type(value)` and do the processing you have in your `__init__` instead, e.g. something to the effect of `if isinstance(value, str): return super().__new__(cls, int(value))` EDIT rather, you have to add the attribute to what is returned by `super().__new__` as in the example above — juanpa.arrivillaga, Jul 30 '18 at 18:15
@JamesAanderson I just needed an example of _something_ that has a reason to not always return a new object in `__new__`, and a flyweight-registry was the simplest example I could think of, so the logic of my example wouldn't get in the way of the part you need to understand. If that's not good enough for you to understand the idea and build what you need, please let me know what's not clear and I'll update it. — abarnert, Jul 30 '18 at 18:15

James Aanderson · Accepted Answer · 2018-08-01T14:24:46.893

Thanks everyone who helped! This answer was saught out to understand how to refactor an existing program that was already written, but that was having scalability problems. The following is the completed working example. What it demonstrates is:

The ability to test incoming types and avoid unneccessary object duplication at constructor time, given incoming types that are both user-defined and built-in. The ability to construct on the fly from a redefined operator or method. These capabilities are neccessary for writing scalable supportable API code. YMMV.

Foo.py

import sys
import math

class Foo(): 

    # class level property

    num = int(0) 

    # 
    # Python Instantiation Customs: 
    # 
    # Processing polymorphic input new() MUST return something or 
    # an object,  but init() MAYNOT return anything. During runtime 
    # __new__ is running at the class level, while __init__ is 
    # running at the instance level. 
    # 

    def __new__(cls,*arg): 

        print ("arg type: ", type(arg[0]).__name__)

        # since we are functioning at the class level, type() 
        # is reaching down into a non-public namespace, 
        # called "type" which is presumably something that 
        # all objects are ultimately derived from. 

        # functionally this is the same as isinstance() 

        if (type(arg[0]).__name__) == "Foo": 
            fooid = id(arg[0])
            print ("\tinput was a Foo: ", fooid)
            return arg[0] # objects of same type intercede

        # at the class level here, we are calling into super() for 
        # the constructor. This is presumably derived from the type() 
        # namespace, which when handed a classname, makes one of 
        # whatever it was asked for, rather than one of itself.  

        elif (type(arg[0]).__name__) == "int": 
            self = super().__new__(cls)
            self.inum = int(arg[0]) # integers store
            fooid = id(self)
            print ("\tinput was an int: ", fooid)
            return (self)

        elif (type(arg[0]).__name__) == "str": 
            self = super().__new__(cls)
            self.inum = int(arg[0]) # strings become integers
            fooid = id(self)
            print ("\tinput was a str: ", fooid)
            return (self)

#   def __init__(self,*arg):
#       pass

    # 
    # because if I can do collision avoidance, I can instantiate 
    # inside overloaded operators: 
    # 

    def __add__(self,*arg): 

        argtype = type(arg[0]).__name__

        print ("add overload in class:", self.__class__)

        if argtype == "Foo" or argtype == "str" or argtype == "int":   

            print ("\tfrom a supported type")

            # early exit for zero

            if not arg[0]: 
                return self

            # localized = Foo.Foo(arg[0])

            # FAILS: AttributeError: type object 'Foo' has no attribute 'Foo'
            # You can't call a constructor the same way from inside and outside


            localized = Foo(arg[0])

            print ("\tself class: ", self.__class__)
            print ("\tself number: ", self.inum)
            print ()
            print ("\tlocalized class: ", localized.__class__)
            print ("\tlocalized number: ", localized.inum)
            print ()

            answer = (self.inum + localized.inum) 
            answer = Foo(answer)    

            print ("\tanswer class:", answer.__class__)
            print ("\tanswer sum result:", answer.inum)

            return answer

        assert(0), "Foo: cannot add an unsupported type"

    def __str__(self): # return a stringified int or empty string

        # Allow the class to stringify as if it were an int. 

        if self.inum >= 0: 
            return str(self.inum)

testfoo.py

#! /usr/bin/python

import sys
import Foo 

# A python class is not transparent like in perl, it is an object 
# with unconditional inheritance forced on all instances that share 
# the same name. 

classhandle = Foo.Foo 

# The distinction between the special class object, and instance 
# objects is implicitly defined by whether there is a passed value at 
# constructor time. The following therefore does not work. 

# classhandle = Foo.Foo() 

# but we can still write and print from the class, and see it propagate, 
# without having any "object" memory allocated.  

print ("\nclasshandle: ", classhandle)
print ("classhandle classname: ", classhandle.__name__) # print the classname
print ("class level num: ", classhandle.num)     # print the default num
classhandle.classstring = "fdsa" # define an involuntary value for all instances

print ("\n")

# so now we can create some instances with passed properties. 

instance1 = Foo.Foo(int(123)) # 

print ("\ninstance1: ", instance1)
print ("involuntary property derived from special class memory space: ", instance1.classstring)
print ("instance property from int: ", instance1.inum)

print ("\n")

instance2 = Foo.Foo(str("456"))
print ("\ninstance2: ", instance2)
print ("instance2 property from int: ", instance2.inum)

# 
# instance3 stands for (shall we assume) some math that happened a 
# thousand lines ago in a class far far away. We REALLY don't 
# want to go chasing around to figure out what type it could possibly 
# be, because it could be polymorphic itself. Providing a black box so 
# that you don't have to do that, is after all, the whole point OOP. 
# 

print ("\npretend instance3 is unknowningly already a Foo\n")
instance3 = Foo.Foo(str("789"))

## So our class should be able to handle str,int,Foo types at constructor time. 

print ("\ninstance4 should be a handle to the same memory location as instance3\n")

instance4 = Foo.Foo(instance3) # SHOULD return instance3 on type collision

print ("instance4: ", instance4) 

# because if it does, we should be able to hand all kinds of garbage to 
# overloaded operators, and they should remain type safe.  

# since we are now different instances these are now different:  

print ("\nADDING:_____________________\n", instance2.inum, " ", instance4.inum)

instance5 = instance4 + int(549) # instance5 should be a Foo object. 
print ("\n\tAdd instance4, 549, instance5: ", instance4, " ", int(549), " ", instance5, "\n")

instance6 = instance2 + instance4 # also should be a Foo object
print ("\n\tAdd instance2, instance4, instance6: ", instance2, " ", instance4, " ", instance6, "\n")

print ("stringified instance6: ", str(instance6))

Type detection and collision avoidance at constructor time

2 Answers2