How to compare two structures via pointer equality in Elixir / Erlang

Question

(Example given in Elixir.)

Suppose I have the following code,

x = {1, 2}
a1 = {"a", {1, 2}}
a2 = {"a", {1, 2}}
a3 = {"a", x}

which as far as I know creates three tuples {1, 2} at different memory locations.

Using the operators == or === for comparing any of the a variables always returns true. This is expectable, as these two operators only differ when comparing numeric types (i.e., 1 == 1.0 is different to 1 === 1.0).

So, I then tried comparing the structures via pattern matching, using the following module (strictly created to test my case),

defmodule Test do
  def same?({x, y}, {x, y}), do: true
  def same?(_, _), do: false
end

but calling Test.same?(a1, a3) also returns true.

How can I compare two structures using pointer equality, so that I can determine if they are the same structure in memory?

Thanks

That is a very interesting question, from which I would like to see an answer with references to the way the Erlang VM works. It would shed some light on elixir/erlang mechanics. — Nathan Ripert, Sep 10 '18 at 13:18
@NathanRipert I think I have answered the way you are looking for. — mljrg, Sep 10 '18 at 15:23
You may find this: https://github.com/happi/theBeamBook helpful in answering this sort of question. Short answer: you shouldn't care. Erlang is intentionally designed to hide these implementation details. — Onorio Catenacci, Sep 11 '18 at 13:03
@OnorioCatenacci Yes, I should not care. But I should care for duplication between large structures, especially when the duplication can be very large. It was in this context, to eliminate such duplication, that I came up with the question. See my own answer to the question below. — mljrg, Sep 11 '18 at 15:42
That's premature optimization plain and simple. You're assuming that all of your code will live on one machine--or at least in one memory space. Erlang permits (indeed encourages) running applications that live on multiple machines. Comparing references at that point is not only useless--it can lead you to believe things which are simply not true. You should learn the Erlang idioms before you decide that you need to optimize things. — Onorio Catenacci, Sep 11 '18 at 16:58
@OnorioCatenacci Don't assume that everyone is able to pay for several machines in the cloud. I agree that performance optimization comes third (the first being make it work, the second make it right), but if there is the chance of duplication of very large substructures, no idiom will save your day, especially if you want to cache such structures and are paying a lot for RAM and disk in the cloud. — mljrg, Sep 11 '18 at 17:06
See this [Q & A](https://stackoverflow.com/questions/3406425/does-erlang-always-copy-messages-between-processes-on-the-same-node). Specifically this answer from @RobertVirding (one of the creators of Erlang): "As has been mentioned here and in other questions current versions of Erlang basically copy everything except for larger binaries. In older pre-SMP times it was feasible to not copy but pass references. While this resulted in very fast message passing it created other problems in the implementation . . ." — Onorio Catenacci, Sep 11 '18 at 17:23
@OnorioCatenacci If I understand what you are worried about (thanks for this, I appreciate), I am aware that Erlang copies everything except large binaries, which means that it will copy large structures, hence generating duplication again, right? But if it happens for the structures that may have duplicate subtrees to live always inside the same process, then removing such duplication inside the process can save lots of memory. I am talking about to compress large amounts of duplicate data inside the same process. — mljrg, Sep 11 '18 at 17:29

legoscia · Accepted Answer · 2018-09-10T13:57:31.717

There is no "official" way to do this, and I would say that if you think you actually need to do this, you're doing something wrong and should ask another question about how to achieve the goal you want to achieve. So this answer is offered in the spirit of playfulness and exploration, in the hope that it spreads some interesting knowledge about the Erlang/Elixir VM.

There is a function, erts_debug:size/1, that tells you how many memory "words" an Erlang/Elixir term occupies. This table tells you how many words various terms use. In particular, a tuple uses 1 word, plus 1 word for each element, plus the storage space for any elements that are "non-immediate". We're using small integers as elements, and they are "immediates" and thus "free". So this checks out:

> :erts_debug.size({1,2})
3

Now let's make a tuple containing two of those tuples:

> :erts_debug.size({{1,2}, {1,2}})
9

That makes sense: the two inner tuples are 3 words each, and the outer tuple is 1+2 words, for a total of 9 words.

But what if we put the inner tuple in a variable?

> x = {1, 2}
{1, 2}
> :erts_debug.size({x, x})
6

Look, we saved 3 words! That's because the contents of x only counts once; the outer tuple points to the same inner tuple twice.

So let's write a little function that does this for us:

defmodule Test do
  def same?(a, b) do
    a_size = :erts_debug.size(a)
    b_size = :erts_debug.size(b)
    # Three words for the outer tuple; everything else is shared
    a_size == b_size and :erts_debug.size({a,b}) == a_size + 3
  end
end

System working? Seems to be:

> Test.same? x, {1,2}
false
> Test.same? x, x
true

Goal accomplished!

However, say we're trying to call this function from another function in a compiled module, not from the iex shell:

  def try_it() do
    x = {1, 2}
    a1 = {"a", {1, 2}}
    a2 = {"a", {1, 2}}
    a3 = {"a", x}

    IO.puts "a1 and a2 same? #{same?(a1,a2)}"
    IO.puts "a1 and a3 same? #{same?(a1,a3)}"
    IO.puts "a3 and a2 same? #{same?(a3,a2)}"
  end

That prints:

> Test.try_it
a1 and a2 same? true
a1 and a3 same? true
a3 and a2 same? true

That's because the compiler is smart enough to see that those literals are equal, and coalesces them to one term while compiling.

Note that this sharing of terms is lost when terms are sent to another process, or stored in / retrieved from an ETS table. See the Process Messages section of the Erlang Efficiency Guide for details.

“this sharing of terms is lost”—that’s not exactly true. E.g. atoms are global. — Aleksei Matiushkin, Sep 10 '18 at 13:38
@legoscia I undertand what the compiler is doing there, and its similar to what many compilers in other languages do, known as (https://en.wikipedia.org/wiki/Constant_folding)[constant folding]. But what happens at runtime if I create two structures from two files whose content shares some text, and read inside the same process. Will those structures share the same parts of the structure corresponding to the same content in the files? — mljrg, Sep 10 '18 at 13:46
There is no mechanism for folding terms created at runtime (except for atoms), so those terms would not share memory - unless the OTP team comes up with some clever optimisation in a future release. — legoscia, Sep 10 '18 at 13:48
@legoscia So that means that if the same content in the two files necessarily results in memory duplication in the two structures, which is what I was expecting (and normal). — mljrg, Sep 10 '18 at 14:02
I assume then that operators `==` and `===` will internally detect when two substructures are the same element in memory, and immediately compare them as `true`, instead of going all down the same substructure during comparison. Do you know this true? — mljrg, Sep 10 '18 at 14:05
Yes, that happens here: https://github.com/erlang/otp/blob/fd591b6f7bb681dd5335a67e66b1d0b8ecf2a76f/erts/emulator/beam/utils.c#L2763-L2765 — legoscia, Sep 10 '18 at 14:52
I would like to thank you for your time, and especially for telling me about function `:erts_debug.size(x)`, which was of crucial for my understanding of this subject. — mljrg, Sep 10 '18 at 15:33

score 8 · Answer 2 · answered Oct 17 '19 at 17:37

Erlang/OTP 22 (and possibly earlier) provides :erts_debug.same/2, which will allow you to do the desired memory pointer test. However, note the function is undocumented and in a module named erts_debug, so you should only rely on it for debugging and testing, and never in production code.

In my almost 9 years using Erlang/Elixir, I have only used it once, which is to test that we are not needlessly allocating structs in Ecto. Here is the commit for reference.

mljrg · Answer 3 · 2018-12-20T00:30:06.947

Let me answer my question:

There is no need for developers to do pointer comparison explicitly, because Elixir already does that internally, in pattern matching and in operators == and === (via the corresponding Erlang operators).

For example, given

a1 = {0, {1, 2}}
a2 = {1, {1, 2}}
x = {a1, a2}
s = {1, 2}
b1 = {0, s}
b2 = {1, s}
y = {b1, b2}

in IEx we have

Interactive Elixir (1.7.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> a1 = {0, {1, 2}}
{0, {1, 2}}
iex(2)> a2 = {1, {1, 2}}
{1, {1, 2}}
iex(3)> x = {a1, a2}
{{0, {1, 2}}, {1, {1, 2}}}
iex(4)> s = {1, 2}
{1, 2}
iex(5)> b1 = {0, s}
{0, {1, 2}}
iex(6)> b2 = {1, s}
{1, {1, 2}}
iex(7)> y = {b1, b2}
{{0, {1, 2}}, {1, {1, 2}}}
iex(8)> :erts_debug.size(x)
15
iex(9)> :erts_debug.size(y)
12
iex(10)> x == y
true
iex(11)> x === y
true

That is, x and y are content equal, but memory different, because y occupies less memory than x as it internally shares substructure s.

In short, == and === do both content and pointer comparison. Pointer comparison is the most efficient way for Erlang to avoid traversing the same substructure on both sides of the comparison, thus saving lots of time for large shared substructures.

Now, if structural duplication across two structures is a concern, like when they are loaded from two large files with similar content, then one must compress them into two new structures sharing the parts in which they are content equal. This was the case of a1 and a2 which were compressed as b1 and b2.

FYI: `===` is mapped to erlang’s `=:=` which _compares types and delegates to `==` if types are the same_. It cannot be more efficient by any mean since it performs one additional operation. — Aleksei Matiushkin, Sep 10 '18 at 15:55
@mudasobwa That's not entirely correct description: e.g. it would imply `{1.0, 1.0} === {1, 1}` to be true, since both are tuples and `==` for them is true. Rather, it's recursive just like `==` except with a single different base case: it never treats floats and integers as equal. — Alexey Romanov, Sep 10 '18 at 16:19
@AlexeyRomanov yes, indeed, I missed the recursion part, thanks. — Aleksei Matiushkin, Sep 10 '18 at 16:21
@AlexeyRomanov So the two operators are only different in what respects floats and integers. — mljrg, Sep 10 '18 at 17:00

Aleksei Matiushkin · Answer 4 · 2018-09-10T13:12:15.487

2

as far as I know creates three tuples {1, 2} at different memory locations.

Nope, that’s not correct. Erlang VM is smart enough to create a single tuple and refer to it.

It’s worth to mention, that’s possible because everything is immutable.

Also, if you find yourself accomplishing the task as above, you are doing it plain wrong.

edited Sep 10 '18 at 13:12

answered Sep 10 '18 at 13:11

Aleksei Matiushkin

119,336
10
100
160

Do you expect me to type the “Erlang VM in a nutshell” book here? – Aleksei Matiushkin Sep 10 '18 at 13:13
How can I create the same tuple at different memory locations inside the same Erlang process? – mljrg Sep 10 '18 at 13:14
You cannot, and you should not. – Aleksei Matiushkin Sep 10 '18 at 13:14
`Do you expect me to type ...` really? Can't you be succint and explain in one paragraph how does Erlang do it? I am curious to know how efficient is that done at runtime. – mljrg Sep 10 '18 at 13:16
If I was curious about how fast it is, I would do some benchmarks. You might read [this blogpost](http://blog.erlang.org/Memory-instrumentation-in-OTP-21/) and follow links given there. – Aleksei Matiushkin Sep 10 '18 at 13:25
1

Suppose I create to structures from two files in the same process, and that these files have commonalities. Will those structures "automagically" share the same parts of the structure corresponding to the same content in the files? If not, how can I detect that? – mljrg Sep 10 '18 at 13:30
3

Maybe yes, maybe no. You cannot rely on that, that’s the point. – Aleksei Matiushkin Sep 10 '18 at 13:31

score 2 · Answer 5 · answered Sep 10 '18 at 13:39

2

It seems you cannot get to the memory location of a variable in erlang: I think it is a key notion in this topic. Therefore, you can only compare the data, not the pointer pointing on those data.

It seems, that when you create create several variables with the same value, it creates new data in memory, those data being the name of the variable and the binding to the primary data (looks strongly like a pointer). Erlang VM does not duplicate data (I am looking for some proof of that.. so far, it is just the way I see it)

answered Sep 10 '18 at 13:39

Nathan Ripert

872
1
7
18

Please, if you find the proof, write it here. – mljrg Sep 10 '18 at 13:47
@mljrg I will. I got hooked by the curiosity. – Nathan Ripert Sep 10 '18 at 13:48

How to compare two structures via pointer equality in Elixir / Erlang

5 Answers5