Why do some Ruby methods like String#replace mutate copies of variables?

Question

So first off I'm just learning Ruby and coming from a JavaScript background. I have a problem that I can't find an answer to. I have this example:

a = 'red'
b = a
b.replace('blue')
b = 'green'
print a

blue

My question is: why is this a thing? I understand that setting b = a makes them them the same object_id, so there technically two names for the same variable string. But I don't ever see a reason to use this sort of recursive value change. If I'm setting b = a it's because I want to manipulate the value of a without changing it.

Furthermore, it seems sometimes a method will modify a, but sometimes it will cause "b" to become a new object. This seems ambiguous and makes no sense.

When will I ever use this? What is the point? Does this mean I can't pass the value of a into another variable without any changes propagating back to a?

Many OO languages have this pointer-based way of working (variables point to objects, so you need to be aware of difference between manipulating a variable vs manipulating an object - your question shows you are not quite there yet, and I hope someone can make that clear for you). Ruby is unusual in that this applies always without exception - variables are *always* pointers to objects. Many other languages won't do this with Strings, but will with Arrays for example. — Neil Slater, Feb 23 '16 at 19:30
There's no recursion involved. I suggest you rephrase your wording. — Karoly Horvath, Feb 23 '16 at 19:34
@KarolyHorvath: But I think that is part of what any answer needs to address. I'm not sure what the end result should be (i.e. what words should be in the question to explain the problem). Someone looking for example of recursive code in Ruby would be disappointed to find this question, but perhaps it may help to point out the difference and correct terms here. — Neil Slater, Feb 23 '16 at 19:36
You know JavaScript so why is `var o = { a: 'b' }; var o2 = o; o2.a = 11` leaving `o.a === 11` a thing? Same situation, different languages. — mu is too short, Feb 23 '16 at 19:48
If you're doing `b = a` you want another object reference. If you're doing `b = a.dup` you want a *copy* that's independent. You seem to think "recurse" is synonymous with "propagate", but it's not. Recursion is best illustrated when a method calls itself. — tadman, Feb 23 '16 at 19:50

Neil Slater · Accepted Answer · 2016-02-23T20:25:19.073

The issue here is not called recursion, and Ruby variables are not recursive (for any normal meaning of the word - i.e. they don't reference themselves, and you don't need recursive routines in order to work with them). Recursion in computer programming is when code calls itself, directly or indirectly, such as a function that contains a call to itself.

In Ruby, all variables point to objects. This is without exception - although there are some internal tricks to make things fast, even writing a=5 creates a variable called a and "points" it to the Fixnum object 5 - careful language design means you almost don't notice this happening. Most importantly, numbers cannot change (you cannot change a 5 into a 6, they are always different objects), so you can think that somehow a "contains" a 5 and get away with it even though technically a points to 5.

With Strings though, the objects can be changed. A step-by-step explanation of your example code might read like this:

a = 'red'

Creates a new String object with the contents "red", and points variable a at it.

b = a

Points variable b to same object as a.

b.replace('blue')

Calls the replace method on the object pointed to by b (and also pointed to by a) The method alters the contents of the String to "blue".

b = 'green';

Creates a new String object with the contents "green", and points variable b at it. The variables a and b now point to different objects.

print a

The String object pointed to by a has contents "blue". So it is all working correctly, according to the language spec.

When will I ever use this?

All the time. In Ruby you use variables to point, temporarily, to objects, in order to call methods on them. The objects are the things you want to work with, the variables are the names in your code you use to reference them. The fact that they are separate can trip you up from time to time (especially in Ruby with Strings, where many other languages do not have this behaviour)

and does this mean I can't pass the value of "a" into another variable without any changes recursing back to "a"?

If you want to copy a String, there are a few ways to do it. E.g.

b = a.clone

or

b = "#{a}"

However, in practice you rarely just want to make direct copies of strings. You will want to do something else that is related to the goal of your code. Usually in Ruby, there will be a method that does the manipulation that you need and return a new String, so you would do something like this

b = a.something

In other cases, you actually will want changes to be made to the original object. It all depends on what the purpose of your code is. In-place changes to String objects can be useful, so Ruby supports them.

Furthermore it seems sometimes a method will recurse into "a" and sometimes it will cause "b" to become a new object_id.

This is never the case. No methods will change an object's identity. However, most methods will return a new object. Some methods will change an object's contents - it is those methods in Ruby that you need to be more aware of, due to possibility of changing data being used elsewhere - same is true in other OO languages, JavaScript objects are no exception here, they behave in the exact same way.

Thanks that actually clears things up quite a bit. the b = a.something portion really made it click for me. — Jwookie55, Feb 24 '16 at 20:09

Nabeel · Answer 2 · 2016-02-23T19:55:22.033

1

It can be useful in a scenario when dealing with recursion in a hash.

obj = {}
ary = [1,2,3]

temp_obj = obj

ary.each do |entry|
  temp_obj[entry] = {}
  temp_obj = temp_obj[entry]
end

> obj
=> {1=>{2=>{3=>{}}}}

If you wish to duplicate you could just use dup

> a = 'red'
=> "red"
> b = a.dup
=> "red"
> b.replace('orange')
=> "orange"
> a
=> "red"
> b
=> "orange"

However dup does not do a deep_copy as pointed out in the comments, see example

> a = {hello: {world: 1}}
 => {:hello=>{:world=>1}}
> b = a.dup
 => {:hello=>{:world=>1}}
> b[:hello][:world] = 4
 => 4
> a
 => {:hello=>{:world=>4}}
> b
 => {:hello=>{:world=>4}}

edited Feb 23 '16 at 19:55

answered Feb 23 '16 at 19:46

Nabeel

2,272
1
11
14

1

Ah - so I suppose you would have to use Marshal then? – Nabeel Feb 23 '16 at 19:51
2

ActiveSupport provides `deep_dup` for this situation. For flat objects like String, `dup` is sufficient. – tadman Feb 23 '16 at 19:52

Todd A. Jacobs · Answer 3 · 2016-02-23T21:31:56.983

TL;DR

In your original question, now edited, you are confusing recursion with mutation and propagation. All three concepts are useful tools in the right situations, and when the behavior is expected. You likely find the particular example you posted confusing because you aren't expecting the string to mutate in place, or for the change to propagate across all pointers to that object.

The ability to generalize methods is what enables duck-typing in dynamic languages like Ruby. The main conceptual hurdle is understanding that variables point to objects, and only experience with the core and standard libraries will enable you to understand how objects respond to particular messages.

Strings in Ruby are full-fledged objects that respond to messages, rather than simply being language primitives. In the following sections, I attempt to explain why this is rarely a problem, and why the feature is useful in a dynamic language like Ruby. I also cover a related method that produces the behavior you were originally expecting.

It's All About Object Assignment

My question is why is this a thing. I understand that setting "b=a" makes them them the same object_id so there technically two names for the same variable string.

This is rarely a problem in everyday programming. Consider the following:

a = 'foo' # assign string to a
b = a     # b now points to the same object as a
b = 'bar' # assign a different string object to to b

[a, b]
#=> ["foo", "bar"]

This works the way you'd expect, because the variable is just a placeholder for an object. As long as you're assigning objects to variables, Ruby does what you might intuitively expect.

Objects Receive Messages

In your posted example, you're running into this behavior because what you're really doing is:

a = 'foo'       # assign a string to a
b = a           # assign the object held in a to b as well
b.replace 'bar' # send the :replace message to the string object

In this case, String#replace is sending a message to the same object pointed to by both a and b. Since both variables hold the same object, the string is replaced whether you invoke the method as a.replace or b.replace.

This is perhaps not intuitive, but it is rarely a problem in practice. In many cases, this behavior is actually desirable so that you can pass objects around without caring how a method labels an object internally. This is useful for generalizing a method, or for self-documenting a method's signature. For example:

def replace_house str
  str.sub! 'house', 'guard'
end

def replace_cat str
  str.sub! 'cat', 'dog'
end

critter = 'house cat'    
replace_house critter; replace_cat critter
#=> "guard dog"

In this example, each method expects a String object. It doesn't care that the string is labeled critter elsewhere; internally, the method uses the label str to refer to that same object.

As long as you know when a method mutates the receiver and when it passes back a new object, you will be unsurprised by the results. More on this in a moment.

What String#replace Really Does

In your specific example, I can see how the documentation for String#replace might be confusing. The documentation says:

replace(other_str) → str
Replaces the contents and taintedness of str with the corresponding values in other_str.

What this really means is that b.replace is actually mutating the object ("replacing the contents"), not returning a new object for assignment to the variable. For example:

# Assign the same String object to a pair of variables.
a = 'foo'; b = a;

a.object_id
#=> 70281327639900

b.object_id
#=> 70281327639900

b.replace 'bar'
#=> "bar"

b.object_id
#=> 70281327639900

a.object_id == b.object_id
#=> true

Note that the object_id never changes. The particular method you used reuses the same object; it just changes its contents. Contrast this with methods like String#sub which return a copy of the object, which means you'd get back a new object with a different object_id.

What to Do Instead: Assigning New Objects

If you want a and b to point to different objects, you can use a non-mutating method like String#sub instead:

a = 'foo'; b = a;
b = b.sub 'oo', 'um'
#=> "fum"

[a.object_id, b.object_id]
#=> [70189329491000, 70189329442400]

[a, b]
#=> ["foo", "fum"]

In this rather contrived example, b.sub returns a new String object, which is then assigned to the variable b. This results in different objects being assigned to each variable, which is the behavior you were originally expecting.