3

High-level picture of my program

  • purpose: parse an XML file and save text into similar python objects
  • problem: Every time I create a new python object and append it to a list, instead of creating a new object it seems to append a reference to the previous objects.

Summary of what my intended structure should be:

list of applications that each contains a list of connections

app1: 
     connection1
     connection2
app2:
     connection3
     connection4
     connection5

so thats a summary of what it should do... so here is my main function:

def main(self):
    root = get_xml_root()
    root.get_applications()
    for application in root.applications:
        application.get_connections()           ## this is where the memory goes bad!!!
        for connection in application.connections:
              connection.do_something()

How I know there is a memory problem:

  • When I change one thing in one list of connections that belong to a particular application, the connections in another application will also change.
  • I printed out the memory locations for the connections and found that there are duplicate references (see memory prints)

Memory print outs

  • when I printed out application locations I got the following (its not pretty, but you can see that at least the addresses are different):

generator_libraries.data_extraction.extraction.Application_XML instance at 0x15a07e8 - memory location = 22677480 generator_libraries.data_extraction.extraction.Application_XML instance at 0x15a0758 - memory location = 22677336 generator_libraries.data_extraction.extraction.Application_XML instance at 0x15a0830 - memory location = 22677552 generator_libraries.data_extraction.extraction.Application_XML instance at 0x15a0878 - memory location = 22677624 generator_libraries.data_extraction.extraction.Application_XML instance at 0x15a08c0 - memory location = 22677696 generator_libraries.data_extraction.extraction.Application_XML instance at 0x15a0908 - memory location = 22677768 generator_libraries.data_extraction.extraction.Application_XML instance at 0x15a0950 - memory location = 22677840 generator_libraries.data_extraction.extraction.Application_XML instance at 0x15a0998 - memory location = 22677912 generator_libraries.data_extraction.extraction.Application_XML instance at 0x15a09e0 - memory location = 22677984 generator_libraries.data_extraction.extraction.Application_XML instance at 0x15a0a28 - memory location = 22678056

  • when I printed out connection locations for 3 different applications I got the following (you can see the duplication among addresses):

    • app1::
    • memory location = 22721168
    • memory location = 22721240
    • memory location = 22721024
    • memory location = 22721600
    • memory location = 22721672

    • app2:

    • memory location = 22721240
    • memory location = 22721672
    • memory location = 22721600
    • memory location = 22721168
    • memory location = 22722104
    • memory location = 22722176

conclusions from memory analysis It seems that every time I create a new connection object and append it to my "connections" list, instead of creating a new object, it takes the memory reference from my previous objects.

A more detailed view of the problematic function's code

class Application_XML(XML_Element_Class):
    name = None
    connections=copy.deepcopy([])
    xml_element=None
    def get_connections(self):
        xml_connections = self.get_xml_children()
        for xml_connection in xml_connections:
            connection = None       ## reset the connection variable
            connection = Connection_XML(xml_connection)
            connection.unique_ID = connection_count
            self.connections.append(copy.deepcopy(connection))
            del connection      ## reset where its pointing to
            connection_count+=1
        self.log_debugging_info_on_connection_memory()   ### this is where I look at memory locations

A class that does the same thing... but works

class Root_XML(XML_Element_Class):
    applications = copy.deepcopy([])
    def get_applications(self):
        xml_applications = self.get_xml_children()
        for xml_application in xml_applications:
            self.applications.append(Application_XML(xml_application))
        self.log_application_memory_information()

If it is any help, here is the connection class:

class Connection_XML(XML_Element_Class):
    ### members
    name = None
    type = None
    ID = None
    max_size = None
    queue_size = None
    direction = None
    def do_something(self):
        pass

Final Words

I have tried nearly every trick in the book in terms of alternate ways of creating the objects, destroying them after I make them... but nothing has helped yet. I feel that there may be an essential python memory concept behind the answer... but after all my searching online, nothing has shed any light onto the answer.

Please, if you can help that would be awesome!!! Thanks :)

Pswiss87
  • 725
  • 1
  • 6
  • 16
  • Welcome to Python. [Here's a brief guide to Python variable semantics from a C++ perspective.](http://rg03.wordpress.com/2007/04/21/semantics-of-python-variable-names-from-a-c-perspective/) – user2357112 Aug 23 '13 at 00:42
  • Wait, you have Python questions from January. I'm surprised you didn't get bitten by this for 8 months. – user2357112 Aug 23 '13 at 00:45
  • Bug: Unlike other languages, assigning to a variable at class scope doesn't declare an instance field; it sets a class attribute. If you want instance attributes, set `self.attribute_name = whatever_value` in the `__init__` method. – user2357112 Aug 23 '13 at 00:51
  • It would really help if you posted an [SSCCE](http://sscce.org) that runs and demonstrates your problem that we can fix for you, instead of just posting fragments of code and a description. – abarnert Aug 23 '13 at 00:54
  • 1
    In addition to the comments about class verses instance variables, `connections=copy.deepcopy([])` is redundant. A new list is created with `connections=[]` (or more likely `self.connections=[]`), you don't need to copy it. – tdelaney Aug 23 '13 at 00:56
  • @user2357112 you were correct when you mentioned that the issue was with using class attributes instead of instance attributes. That fixed the problem. – Pswiss87 Aug 23 '13 at 20:04

1 Answers1

1

I don't think the problem has anything to do with the part you're looking at, but rather with the Connection_XML class:

class Connection_XML(XML_Element_Class):
    ### members
    name = None
    type = None
    ID = None
    max_size = None
    queue_size = None
    direction = None
    def do_something(self):
        pass

All of those members are class attributes. There's a single name shared by every Connection_XML instance, a single type, etc. So, even if your instances are all unique objects, changing one changes all of them.

You want instance attributes—a separate name, etc., for each instance. The way you do that is to just create the attributes dynamically, usually in the __init__ method:

class Connection_XML(XML_Element_Class):
    def __init__(self):
        self.name = None
        self.type = None
        self.ID = None
        self.max_size = None
        self.queue_size = None
        self.direction = None
    def do_something(self):
        pass

It's hard to be sure this is your problem without a real SSCCE. In this toy example, all of the attributes have the value None, which is immutable, so it won't really lead to these kinds of problems. But if one of them is, say, a list, or an object that has its own attributes, it will.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    I think it's `Application_XML` with the `connections` class attribute. That one's a list. – user2357112 Aug 23 '13 at 00:54
  • you were absolutely right! I needed to make instance attributes instead of class attributes. Apparently its not a problem with immutable types, but with mutable types like lists (what I was using) if you change one, then you change them all. Once I made them instance attributes the problem was fixed. Thanks for your help! – Pswiss87 Aug 23 '13 at 20:02
  • @Pswiss87: Well, technically, it's not even really a problem with mutable types. If you just re-bind a class attribute name that happened to be holding a list, you'll get an instance attribute; it's only if you mutate the list in-place (e.g., with `append`) that you're affecting the shared value. But really, even if there _isn't_ a problem, it's much clearer to use class attributes only for things that are intended to be shared as-is (such as constants), not for things that many instances will replace. – abarnert Aug 23 '13 at 20:10