1

I'm trying to modify a copy of an array without changing the original array. It's an array of hashes, so to to make an "all new" copy of the array I use:

foo = [ { :a => "aaaaaa" } ]
foocopy = foo.map { |h| h.dup }

I want to append some data to a string in the hash in the copy.

It works fine if I use = and +:

foocopy.first[:a] = foocopy.first[:a] + "bbbbb"
foo
=> [{:a=>"aaaaaa"}]  # original unchanged as expected
foocopy
=> [{:a=>"aaaaaabbbbb"}]

However if I use << it modified BOTH the copy and the original:

foocopy.first[:a] << "cccccc"
foo
=> [{:a=>"aaaaaacccccc"}]   # ORIGINAL got changed too
foocopy
=> [{:a=>"aaaaaacccccc"}]

Is that a bug in Ruby?

jpw
  • 18,697
  • 25
  • 111
  • 187
  • It's *highly* unlikely you'll find a bug in well-beaten things like `<<` and `+` when working with arrays. – the Tin Man Jul 02 '15 at 19:30
  • And yet... it happens... http://stackoverflow.com/questions/29224421/rails-3-2-saving-serialized-hash-will-not-save-number-with-delimiter which as far as anyone weighed in seems to be a rails bug that persists to 4. In this case, because of the particular case (modifying a string inside a hash inside a copied array) it seemed plausibly to be a bug. But the explanation given by @jorge was instructive. My point being, while unlikely, it's not impossible and AFAIK SO is a pretty good place to ask and find out. – jpw Jul 02 '15 at 19:44
  • 1
    Rails is hardly the same layer as Ruby and `<<` and `+` are used an inordinately number of times more than a Rails method, *any* Rails method. So, finding a bug in Rails is much more likely than a core method in Ruby. – the Tin Man Jul 02 '15 at 20:15

2 Answers2

2

No, this is because you duplicated the array and the hash, but the string is an object with the same id, as ruby handles the strings in a weird way.

irb(main):001:0> foo = [ { :a => "aaaaaa" } ]
=> [{:a=>"aaaaaa"}]
irb(main):002:0> foocopy = foo.map { |h| h.dup }
=> [{:a=>"aaaaaa"}]
irb(main):003:0> foo.object_id
=> 70252221980900
irb(main):004:0> foocopy.object_id
=> 70252221915920
irb(main):005:0> foocopy.first.object_id
=> 70252221915880
irb(main):006:0> foo.first.object_id
=> 70252221980940
irb(main):007:0> foocopy.first[:a].object_id
=> 70252221980960
irb(main):008:0> foo.first[:a].object_id
=> 70252221980960

This means that: a+b reinstantiates this object into something changed, and a << b modifies the instance of the object. It's the actual method behaviour.

Just with the string:

irb(main):009:0> a = "test"
=> "test"
irb(main):010:0> b = a.dup
=> "test"
irb(main):011:0> a.object_id
=> 70252221685660
irb(main):012:0> b.object_id
=> 70252221662100
irb(main):013:0> a = a + "1"
=> "test1"
irb(main):014:0> a.object_id
=> 70252221586140
irb(main):015:0> b << "1"
=> "test1"
irb(main):016:0> b.object_id
=> 70252221662100

And from the documentation:

http://ruby-doc.org/core-2.2.0/String.html#method-i-2B

http://ruby-doc.org/core-2.2.0/String.html#method-i-3C-3C

Jorge de los Santos
  • 4,583
  • 1
  • 17
  • 35
2

dup performs a "shallow copy" of an object. So you are creating a new Hash that has the same keys and values! Unfortunately Ruby doesn't have a nice built-in way to create a "deep copy" of a Hash, where all referenced objects are also copied. So what should you do?

I think you already found the best solution, which is to use +=. That's because + creates a new object and = overwrites the copied object.

But there is a simple hack to deep copy an object in Ruby, which is to serialize/unserialize it using Marshal.

foo = [ { :a => "aaaaaa" } ]
foocopy = Marshal.load(Marshal.dump(foo))

Then you won't have any surprises due to pointers being shared across objects. And your << code will work as you expected.

Max
  • 21,123
  • 5
  • 49
  • 71