18

Ok, let me explain the problem with a simple example:

l = [[0]]*3       # makes the array [[0], [0], [0]]
l[0][0] = 42      # l becomes [[42], [42], [42]]
from copy import deepcopy
m = deepcopy(l)   # m becomes [[42], [42], [42]]
m[0][0] = 2       # m becomes [[2], [2], [2]]

This is a basic shared reference problem. Except usually, when a problem like this occurs, deepcopy is our friend. Currently, I made this to solve my deepcopy betrayal problem:

l = [[0]]*3       # makes the array [[0], [0], [0]]
import JSON
m = JSON.loads(JSON.dumps(l)) # m becomes [[0], [0], [0]] with no self references

I am looking for a less inefficient and less stupid way of handling self shared reference.

Of course I wouldn't make arrays like that on purpose, but I need to handle the case where someone gives one to my code. Running my "solution" on large arrays is slow, and I have many levels of nested arrays, I can't afford to make a string this big for those beasts.

smci
  • 32,567
  • 20
  • 113
  • 146
Benoît P
  • 3,179
  • 13
  • 31
  • Are you only making deep copies of an impromptu list like `[0]`? Or an actual object? – r.ook Jan 31 '19 at 18:44
  • 1
    So the shared reference could be anything? A custom object, a dict? Or just list? – Dani Mesejo Jan 31 '19 at 18:45
  • 7
    "but I need to handle the case where someone gives one to my code" - if someone gives you a broken input, their code is broken, and it's their problem. Trying to fix it for them just hides their bug and makes it harder for them to become aware of it. They need to learn to build nested lists correctly. – user2357112 Jan 31 '19 at 18:46
  • The object is a mix of dictionaries and lists – Benoît P Jan 31 '19 at 18:47
  • 1
    And the depth of the list is arbitrary? – Dani Mesejo Jan 31 '19 at 18:48
  • yes, I am looking for a general solution – Benoît P Jan 31 '19 at 18:49
  • 1
    You can write your own function to copy your list, depending on what exactly the structure you are expecting is and how you want to handle things. – juanpa.arrivillaga Jan 31 '19 at 18:49
  • A similar approach but without `json` (so perhaps quicker) would be `ast.literal_eval(str(l))` – AChampion Jan 31 '19 at 18:50
  • @juanpa.arrivillaga but that is hardly more efficient than `JSON.loads(JSON.dumps(l))` as posted in the question, especially if the depth of `list/dict` is arbitrary. – r.ook Jan 31 '19 at 18:50
  • 1
    @Idlehands sure it is, you don't make a totally unnecessary string. At the end of the day, you need to create a copy, but at least that would copy things directly. Now, if you really wanted to be efficient, you would have to walk the structure and check for duplicate list objects, and only make a copy of those in-place (hopefully avoiding a lot of necessary copies, but note,for the example, it would be the same) – juanpa.arrivillaga Jan 31 '19 at 18:52
  • @juanpa.arrivillaga Don't get me wrong that's what I thought too, but then I realize at some point you'd have to create a `set` to check for duplicates anyhow. And now it's just down to the definition of "efficient", whether it means concise code, or run time. Perhaps I'm just overthinking it. – r.ook Jan 31 '19 at 18:55
  • @Idlehands "concise code" is generally not what I take to be "efficient". And yeah, but a set of `int` objects isn't a big deal, at least compared to copying a bunch of list objects unecessarily – juanpa.arrivillaga Jan 31 '19 at 18:57
  • as @user2357112 said, i think this is the wrong thing to want to do. If you want to copy a list where all elements point to the same object you should get a list which also has all its elements point to one object, otherwise it's not really a copy, it's a different thing – sam46 Jan 31 '19 at 19:10
  • 1
    I agree with @BanishedBot, this is not copying. Deepcopy ensures the resulting object has no common reference with the input object. But it can't do anything about common references within the input object. In fact, why do you have such an input object to start with if you don't want common references? The common references should be there for a reason. – Tim Jan 31 '19 at 19:13
  • If all you have is lists, dictionaries, strings and built-in numeric types then you can write code that traverses the structure and creates copies of everything. That would be an awful lot of work just to deal with broken input. – Stop harming Monica Jan 31 '19 at 19:19
  • Strictly, your input is not a list, it's a list-of-lists (or according to you, arbitrarily nested list and/or dict). Since this particular list-of-lists has depth two, you will need two deepcopies to unpack that. In general you'll just want to recurse every time you find a nested structure. – smci Jan 31 '19 at 19:29
  • 1
    A potential problem can be that you have a cyclic structure. For example `l = []; l.append(l)`. In that case the "copying" mechanism will never quit. – Willem Van Onsem Feb 01 '19 at 09:24
  • So does every solution. It is impossible to remove a cyclical self reference. – Benoît P Feb 01 '19 at 09:27
  • @BenoîtPilatte: well the question is more a matter of *what* you want to do in that case. The cycles can be detected. But "removing" is a bit vague. In that case you want to remove the elements from the list? – Willem Van Onsem Feb 01 '19 at 09:35
  • I think it will raise a `RecursionError: maximum recursion depth exceeded`, and I have no intention of catching it. – Benoît P Feb 01 '19 at 10:17

7 Answers7

10

Here's an approach that will work on any combination of lists, dicts, and immutable values.

def very_deep_copy(obj):
    if isinstance(obj, list):
        return [very_deep_copy(item) for item in obj]
    elif isinstance(obj, dict):
        return {k: very_deep_copy(v) for k,v in obj.items()}
    else:
        return obj

l = [[0]]*3 
m = very_deep_copy(l)
m[0][0] = 2
print(m)

Result:

[[2], [0], [0]]
Kevin
  • 74,910
  • 12
  • 133
  • 166
  • but then `m` isn't an exact copy of `l`. but I guess this is what OP wants – sam46 Jan 31 '19 at 19:03
  • for further readers: this solution will fail on recursive structures like `x = []; x.append(x)` (which not so common, but we should bear them in mind) with `RecursionError` – Azat Ibrakov Feb 01 '19 at 12:17
7

I'm going to challenge the assumption that the right thing to do is to copy the shared objects. You say that

Of course I wouldn't make arrays like that on purpose, but I need to handle the case where someone gives one to my code.

but if someone passes you an input with unexpected object sharing, their code has a bug. If your code notices the bug, your code should tell them about it by throwing an exception, to help them fix their bugs.

Most code would just assume the input doesn't have any unwanted object sharing. If you want to detect it anyway, a manual traversal is probably your best option, especially since your input is expected to be JSON-serializable:

def detect_duplicate_references(data):
    _detect_duplicate_references(data, set())

def _detect_duplicate_references(data, memo):
    if isinstance(data, (dict, list)):
        if id(data) in memo:
            raise ValueError(f'multiple references to object {data} detected in input')
        memo.add(id(data))
    if isinstance(data, list):
        for obj in data:
            _detect_duplicate_references(obj, memo)
    if isinstance(data, dict):
        for obj in data.values():
            _detect_duplicate_references(obj, memo)
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • 2
    I like this. +1 This could easily be used to create a deep copy like @Kevin's answer but it's gloriously pedantic and feels so right. – r.ook Jan 31 '19 at 19:13
2
l = [[0] for _ in range(3)]
l[0][0] = 2
print(l)

prints:

[[2], [0], [0]]

also, btw in your code deepcopy() gave the output it did because you passed in a list which already had elements that share the same reference.

from copy import deepcopy
m = deepcopy(l)
m[0][0] = 3
print(l) # prints [[2], [0], [0]]
print(m) # prints [[3], [0], [0]]

you can see here it does what we expect it to no problem

sam46
  • 1,273
  • 9
  • 12
  • 1
    I don't think OP is asking for *ways to avoid a deep copy*, but rather *ways to get out of a deep copy*. – Rocky Li Jan 31 '19 at 18:45
1

Assuming only structure type will be list, this should work for list of arbitrary depth and complexity:

def customcopy(l):
    return [customcopy(e) if type(e) is list else e for e in l]

l = [[0]]*3
x = customcopy(l)
x[0][0] = 3

>>> x
[[3], [0], [0]]
Rocky Li
  • 5,641
  • 2
  • 17
  • 33
0

Not sure it's efficient but you could try:

l = [deepcopy(elt) for elt in l]
0

For a one-level-deep copy, you can just check the references. For deeper copies, just do this recursively.

from copy import deepcopy

def copy_value(l):
    l = list(l)
    new_list = []
    while l:
        item = l.pop(0)
        if item not in new_list:
            new_list.append(item)
        else:
            new_list.append(deepcopy(item))
    return new_list

l = [[0]]*3 
m = copy_value(l)
m[0][0] = 2
print(m)

prints

[[2], [0], [0]]
Tim
  • 3,178
  • 1
  • 13
  • 26
0

Another way with a list comprehension:

def super_deep_copy(l):
   return [i.copy() for i in l]
l = [[0]]*3
l = super_deep_copy(l)
l[0][0] = 2

And now:

print(l)

Is:

[[2], [0], [0]]
U13-Forward
  • 69,221
  • 14
  • 89
  • 114