2

Disclaimer

This post is about the correct usage of the terms "shallow-copy" and "deep-copy", specifically when talking about copying an object which does not contain any references. This question is not meant to be (and should not be) opinion-based, unless there truly is no consensus regarding this topic. I have tagged this question as C, but it might be language-agnostic, unless the meaning of those terms in that context is well-defined for specific languages but not for others.

Preface

The terms "shallow-copy" and "deep-copy" are commonly used when copying an object with references, in order to specify whether or not the copy is complete (independent of the original).

However, I have also seen this terminology used when copying an object without references, where both terms mean the exact same thing and there would be no need to differentiate. So far, I have not found a concise definition which would cover this particular use of those terms.

  • The definitions given on Stack Overflow (in the tags shallow-copy and deep-copy):

    A shallow copy contains a link (address in memory) to the original variable. Changes on shallow copies are reflected on origin object.

    A deep copy duplicates the object or variable being pointed to so that the destination (the object being assigned to) receives its own local copy.

    Under these definitions, a copy of an object without references would be a deep-copy.

  • The definitions given on Wikipedia (in the article Object copying):

    One method of copying an object is the shallow copy. In that case a new object B is created, and the fields values of A are copied over to B. This is also known as a field-by-field copy, field-for-field copy, or field copy. If the field value is a reference to an object (e.g., a memory address) it copies the reference, hence referring to the same object as A does, and if the field value is a primitive type it copies the value of the primitive type. In languages without primitive types (where everything is an object), all fields of the copy B are references to the same objects as the fields of original A. The referenced objects are thus shared, so if one of these objects is modified (from A or B), the change is visible in the other. Shallow copies are simple and typically cheap, as they can be usually implemented by simply copying the bits exactly.

    An alternative is a deep copy, meaning that fields are dereferenced: rather than references to objects being copied, new copy objects are created for any referenced objects, and references to these placed in B. The result is different from the result a shallow copy gives in that the objects referenced by the copy B are distinct from those referenced by A, and independent. Deep copies are more expensive, due to needing to create additional objects, and can be substantially more complicated, due to references possibly forming a complicated graph.

    Under these definitions, a copy of an object without references would be a shallow-copy.

I think both terms are inappropriate, because "shallow-copy" implies that the copy is incomplete, whereas "deep-copy" implies that some kind of special treatment (or high cost) is required for copying. Since copying an object without references is both complete and yet does not require any special treatment, I would argue that neither of those terms should be used. However, this post is not about what I think, but what is the current consensus (if any) in the programming community.

Questions

When I copy an object without references, would that be considered

  • a shallow-copy (because no references are involved)?
  • a deep-copy (because the target object is independent from the source object)?
  • both?
  • neither?

Is there a good term for a partial deep-copy, where some fields are shallow-copied and others deep-copied?

Géry Ogam
  • 6,336
  • 4
  • 38
  • 67
Felix G
  • 674
  • 1
  • 7
  • 17
  • You're copying all of the pointers, and you're also not copying any, so it's both. – user253751 Jul 17 '20 at 15:11
  • 1
    *The definition on Wikipedia...* There's your mistake. Wikipedia is useful for things like supplying a description or examples. It's a huge mistake to treat is as authoritative... – Andrew Henle Jul 17 '20 at 15:14
  • The term isn't "simple" or "complex" object, the relevant terms are composition or aggregation. You can only perform a deep copy of an aggregation (regardless if members are embedded or referenced), while you *may* either perform shallow, copy-on-write or eager deep copy on a composition. – Ext3h Jul 17 '20 at 15:19
  • 2
    @AndrewHenle, although I agree that a certain amount of discretion needs to be applied to relying on Wikipedia, I find the particular quotations presented in the question to do a good job of defining the terms "shallow copy" and "deep copy" consistently with my understanding of them. I think the present SO tag info for these terms is misleading. – John Bollinger Jul 17 '20 at 15:40
  • Note: **I updated the tag wikis for deep-copy and shallow-copy**. Hopefully they no longer exhibit seeming discrepancies with the Wikipedia article (to which the deep-copy tag wiki already referred, and to which shallow-copy now refers, too.) – John Bollinger Jul 17 '20 at 16:36
  • @Ext3h yeah, i guess simple/complex really doesn't make sense here, so i have edited the question accordingly. However, while composition/aggregation are certainly strongly related to the topic, i'm not quite sure if those really are an exact match either. – Felix G Jul 18 '20 at 08:05
  • @AndrewHenle well, i certainly wouldn't use Wikipedia as a primary reference for... well... pretty much anything important, really. But i figured that, for most users of this site, both the tag definitions as well as that Wikipedia article are very obvious sources and therefore used those. This question is more about discussions on this topic i have seen in the past, and those definitions are essentially just meant as an example of "contradicting evidence". – Felix G Jul 18 '20 at 08:13

2 Answers2

5

When the distinction doesn't apply, just call it a "copy". It's not a shallow copy because there are no shared references and it's not a deep copy because nothing but the values in the structure are copied.

This question is like asking if rocks are atheists. Sure, they aren't theists. But does the theist/atheist distinction really apply to them? Some scales are only designed for measuring certain things.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • 2
    Additionally, it may be useful to recognize that for taxonomic purposes, where there is no distinction, any copy is necessarily *both* deep and shallow. – John Bollinger Jul 17 '20 at 15:55
  • @JohnBollinger Unless you define shallow as "not deep" and deep as "not shallow", in which case it's neither. – David Schwartz Jul 17 '20 at 15:58
  • That's an interesting way to think about it, and it confirms my personal take on the matter. I had hoped for some kind of concise definition, but when thinking about it this way, it's kind of obvious that none would exist, because the those terms just don't make sense in that context. – Felix G Jul 18 '20 at 08:25
  • I strongly disagree with this answer, because unless I am mistaken, @FelixG is not asking whether the shallow/deep copy distinction makes sense in a hypothetical programming language where only objects without reference attributes are allowed (similar to your rock analogy), which in that case of course would not make sense. He is asking whether the distinction makes sense in a real programming language (allowing objects with reference attributes) *for* a particular object without reference attributes. – Géry Ogam Jul 19 '20 at 11:39
  • … And in that case it makes sense, since there is no such thing as *the* "copy" operation. In most languages, there are at least *two* "copy" operations, a "shallow copy" operation (clone-1) and a "deep copy" operation (clone-∞), and some languages provides other clone-*k* operations, like Smalltalk which also provides a clone-2 operation. So you can loosely call an operation on an object without reference attributes a "copy", but this is not an operation of the language, so you should call it a "shallow copy" or a "deep copy" to be precise, even if in that case the target object is the same. – Géry Ogam Jul 19 '20 at 12:15
  • Okay I understand the root of your confusion. You said: "It's not a shallow copy because there are no shared references and it's not a deep copy because nothing but the values in the structure are copied." and "Unless you define shallow as "not deep" and deep as "not shallow", in which case it's neither." These definitions are incorrect. A shallow/deep copy is the target object of a shallow/deep copy operation applied to a source object. – Géry Ogam Jul 19 '20 at 12:43
  • @Maggyero You can refer to the result of a shallow copy operation as a "shallow copy". But you can also call the process of making a shallow copy a "shallow copy". This is because the word "copy" can refer to both the process of making a duplicate and to the duplicate produced by that process. That applies also to "shallow copy" and "deep copy". The word "operation" is not necessary unless there's a chance of confusion, and here there isn't. – David Schwartz Jul 19 '20 at 18:51
  • We agree on that. So it is a contradiction to state that the result object cannot be a shallow copy nor a deep copy like you did in your post, since the process to make that copy was either a shallow copy or a deep copy. Do you see my point? – Géry Ogam Jul 19 '20 at 19:33
  • @Maggyero No, I don't. The process doesn't fit the normal understanding of a "shallow copy" because it doesn't fail to follow pointers/references. It doesn't fit the normal understanding of a "deep copy" because it doesn't copy referenced objects. So we're back to asking if rocks are atheists. – David Schwartz Jul 19 '20 at 19:35
  • The normal understanding of a shallow copy object is the result of a shallow copy operation, not a structural characterisation of the result object involving references. Same for a deep copy. For instance in the Python language, `5` is both a *shallow copy* and a *deep copy* of `5`, because `copy.copy(5) == 5` (shallow copy) and `copy.deepcopy(5) == 5` (deep copy). But it is not just a *copy* of `5` like you state, because this operation does *not* exist in the language! You are talking about a phantom. – Géry Ogam Jul 19 '20 at 19:46
  • @Maggyero The view that it is both a shallow copy and a deep copy is just as valid as the view that it is neither a shallow copy nor a deep copy. One can also argue that python's deep copy operation sometimes makes a shallow copy if it is not possible to make a deep copy. The terms simply aren't defined with sufficient precision to make a definitive answer meaningful. It's back to whether rocks are atheists. – David Schwartz Jul 19 '20 at 20:19
  • "The view that it is both a shallow copy and a deep copy is just as valid as the view that it is neither a shallow copy nor a deep copy." I disagree, it cannot be neither, by definition, since `copy.copy(5) == 5` (shallow copy) and `copy.deepcopy(5) == 5` (deep copy). It is like if you said that 4 is not the result of 2 × 2 because it is also the result of 2 + 2. That would not make any sense. – Géry Ogam Jul 19 '20 at 20:55
  • @Maggyero No, you're missing my point. Though the function is called "deep copy", we agree that it sometimes makes a shallow copy. Right? That some particular language happens to call a function "deep copy" does not mean it aligns with human understandings of what a "deep copy" is. A boolean function "isAtheist", if passed a rock will return "True" or "False" because that's all a boolean function can do. That doesn't mean humans find it meaningful to describe a rock as either an atheist or not an atheist. – David Schwartz Jul 19 '20 at 21:01
  • "Though the function is called "deep copy", we agree that it sometimes makes a shallow copy. Right?" No, that is not how the [deep copy function](https://github.com/python/cpython/blob/master/Lib/copy.py) is implemented in Python (`copy.deepcopy` does not delegate to `copy.copy`). And even if it was the case, that would then contradict your view that it is neither a shallow copy nor a deep copy. – Géry Ogam Jul 19 '20 at 22:49
  • Let us get back to the OP’s question: "When i copy an object that doesn't contain pointers to other objects, would that be considered a shallow-copy? (because no pointers were involved) a deep-copy? (because the destination object is independent of the source) both? neither?" Considered by who? If it is by the caller of the operation, he already knows the answer since he called the operation himself. If it is by someone else, he has to guess which operation has been called by looking at the structure of the source and target objects, and there are two solutions: {shallow copy, deep copy}. – Géry Ogam Jul 19 '20 at 23:06
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/218172/discussion-between-david-schwartz-and-maggyero). – David Schwartz Jul 19 '20 at 23:21
3

The paper Copying and Comparing: Problems and Solutions published by Peter Grogono and Markku Sakkinen in 2000 is a good reference for your questions.

Various copying operations can be applied to a source expression and a target expression:

  • assignment (also known as aliasing), which binds the target expression to the location of the source expression;
  • replacement (also known as mutation), which copies the contents of the source expression into the location of the target expression;
  • cloning, which binds the target expression to a new location and copies the contents of the source expression into that new location, i.e. which performs an allocation followed by a replacement.

In the following diagrams, the arrows represent bindings, the boxes represent locations, X, Y and Z represent names, A, A′, B and B′ represent values, • represent references, the first function parameter represents the target expression and the second function parameter represents the source expression.

Copying operations

Replacement and cloning can be further categorized by their depth:

  • shallow operation, which copies values and references;
  • deep operation, which copies values and performs deep operations on references.

The distinction between shallow and deep operations does not apply to assignment. Shallow cloning and deep cloning are often called shallow copy and deep copy respectively.

Shallow and deep clones

Since there is an infinite number of depth, there is actually an infinite number of replacement and cloning operations besides the shallow and deep ones.

We can define replace-k, a replacement of depth k, as follows:

  • replace-0(X, Y) performs assign(X, Y);
  • replace-k(X, Y) for k > 0 copies the values of Y into the location of X and performs replace-(k − 1) from the references of Y into the location of X.

We can define clone-k, a cloning of depth k, as follows:

  • clone-0(X, Y) performs assign(X, Y);
  • clone-k(X, Y) for k > 0 binds X to a new location, copies the values of Y into that new location and performs clone-(k − 1) from the references of Y into that new location.

Languages that provide cloning operations usually provide only clone-1 (shallow copy) and clone-∞ (deep copy).

Now that we have provided the definitions, let us address your questions.

When I copy an object without references, would that be considered

  • a shallow-copy (because no references are involved)?
  • a deep-copy (because the target object is independent from the source object)?
  • both?
  • neither?

It depends on who is considering the clone-k0 operation with k0 ≥ 1 that has been applied to the source object:

  • If it is considered by the caller, he already knows which operation he has applied to the source object, so the solution is: {clone-k0}.
  • If it is considered by someone else, he has to guess which operation the caller could have applied to the source object only by comparing the structures of the source object and target object, so the solution is: {clone-1, clone-2, …, clone-∞}.

Is there a good term for a partial deep-copy, where some fields are shallow-copied and others deep-copied?

Not to my knowledge, but this kind of copy is often more useful because it is semantic, whereas shallow copy and deep copy are syntactic. So I would call it a semantic copy, as hinted by the paper:

The shallow and deep operations are not generally useful. In most cases, “shallow” is too shallow and “deep” is too deep. In order to be generally applicable, copying operations should respect the semantic properties of objects rather than merely their syntactic properties.

Géry Ogam
  • 6,336
  • 4
  • 38
  • 67
  • But couldn't the fact that both operations reduce to a copy of value attributes just mean that neither term is applicable (as claimed in the other answer)? After all, if both are equivalent, then why would one want to use either of those terms (instead of just "copy")? – Felix G Jul 18 '20 at 08:40
  • @FelixG You can use the term "copy" but this is not a single cloning operation, it is an infinite set of cloning operations: {clone-*k* | *k* in **N**}. So no, both terms "shallow copy" and "deep copy" are applicable because they are solutions. E.g. the solutions in **R** of the equation *x* ^2 = 4 are the *x* in {−2, 2} and we would not say that −2 and 2 are not applicable. Similarly, the solutions in {shallow copy, deep copy} of the equation *f* (obj-without-ref) = obj-without-ref are the *f* in {shallow copy, deep copy} and we do not say that shallow copy and deep copy are not applicable. – Géry Ogam Jul 19 '20 at 11:17
  • @FelixG I have just updated my post with a better reasoning for answering your questions. – Géry Ogam Jul 19 '20 at 11:17