2

I really don't understand the difference between shallow and deep copy. Ruby's #dup seems to create a deep copy when I test it.

Documentation says:

Produces a shallow copy of obj---the instance variables of obj are
copied, but not the objects they reference.

But when I test this it seems to change the objects they reference.

class Klass
  attr_accessor :name
end

a = Klass.new
a.name = "John"
b = a.dup
b.name = "Sue"
puts a.name # John

Why is shallow copy sufficient here when @name is one of objects they reference?
What's the simplest example where deep copy is needed?

Marko Avlijaš
  • 1,579
  • 13
  • 27

2 Answers2

2

The example you have shown does not describe the difference between a deep and a shallow copy. Instead, consider this example:

class Klass
  attr_accessor :name
end

anna = Klass.new
anna.name = 'Anna'

anna_lisa = anna.dup
anna_lisa.name << ' Lisa'
# => "Anna Lisa"

anna.name
# => "Anna Lisa"

Generally, dup and clone are both expected to just duplicate the actual object you are calling the method on. No other referenced objects like the name String in the above example are duplicated. Thus, after the duplication, both, the original and the duplicated object point to the very same name string.

With a deep_dup, typically all (relevant) referenced objects are duplicated too, often to an infinite depth. Since this is rather hard to achieve for all possible object references, often people rely on implementation for specific objects like hashes and arrays.

A common workaround for a rather generic deep-dup is to use Ruby's Marshal class to serialize an object graph and directly unserializing it again.

anna_lena = Marshal.load( Marshal.dump(anna))

This creates new objects and is effectively a deep_dup. Since most objects support marshaling right away, this is a rather powerful mechanism. Note though than you should never unmarshal (i.e. load) user-provided data since this will lead to a remote-code execution vulnerability.

Holger Just
  • 52,918
  • 14
  • 115
  • 123
1

Try this:

class Klass
  attr_accessor :name
end

a = Klass.new
a.name = Klass.new #object inside object
a.name.name = 'George'
b = a.dup
puts b.name.name # George

b.name.name = 'Alex'
puts a.name.name # Alex

Also note that (see info):

When using dup, any modules that the object has been extended with will not be copied.

Edit: Note on Strings (this was interesting to find out) Strings are referenced not copied in the original scenario. This is proven through this case:

a.name = 'George'
puts a.name.object_id # 69918291262760    

b = a.dup
puts b.name # George
puts b.name.object_id # 69918291262760  

b.name.concat ' likes tomatoes' # append to existing string
puts b.name.object_id # 69918291262760  

puts a.name # George likes tomatoes

This works as expected. Referenced objects (including strings) are not copied, and will share the reference.

So why does the original example appear not too? It is because when you set b.name to a something different you are setting it to a new string.

   a.name = 'hello' 

is really short hand for this:

   a.name = String.new('hello')

Therefore in the original example, a.name & b.name are no longer referencing the same object, you can check the object_id to see.

Note that is not the case for Fixnum, floats, true, false or symbols. These objects are duplicated in a shallow copy.

ABrowne
  • 1,574
  • 1
  • 11
  • 21
  • Excellent succint example, but why does this make `#dup` fail? This is making me confused. – Marko Avlijaš Aug 30 '16 at 13:42
  • This should help explain the special case of string objects. Unlike fixnums and symbols which are not 'real' objects, they just act like them, String have some special methods to conserve space. They are neither a 'normal' object nor a fake object like fixnums: http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-shares-string-values – ABrowne Aug 30 '16 at 14:08
  • Thanks. Copy on write optimization for strings is interesting, but not really related to this. – Marko Avlijaš Aug 31 '16 at 05:46
  • @ABrowne In your string example, you are still assigning a new string. Thus, in the end `a.name` you still be `"George"`. The `+=` operator does not modify objects but assigns the result of the addition as a new object. – Holger Just Aug 31 '16 at 11:01
  • @HolgerJust good point, swapped it to an operation that better fits my example. Thanks – ABrowne Aug 31 '16 at 13:03