11

I need an efficient structure for array of thousands of elements of the same type with ability to do random access.

While list is most efficient on iteration and prepending, it is too slow on random access, so it does not fit my needs.

Map works better. Howerver it causes some overheads because it is intended for key-value pairs where key may be anything, while I need an array with indexes from 0 to N. As a result my app worked too slow with maps. I think this is not acceptable overhead for such a simple task like handling ordered lists with random access.

I've found that tuple is most efficient structure in Elixir for my task. When comparing to map on my machine it is faster

  1. on iteration - 1.02x for 1_000, 1.13x for 1_000_000 elements
  2. on random access - 1.68x for 1_000, 2.48x for 1_000_000
  3. and on copying - 2.82x for 1_000, 6.37x for 1_000_000.

As a result, my code on tuples is 5x faster than the same code on maps. It probably does not need explanation why tuple is more efficient than map. The goal is achieved, but everybody tells "don't use tuples for a list of similar elements", and nobody can explain this rule (example of such cases https://stackoverflow.com/a/31193180/5796559).

Btw, there are tuples in Python. They are also immutable, but still iterable.

So,

1. Why tuples are not enumerable in Elixir? Is there any technical or logical limitation?

2. And why should not I use them as lists of similar elements? Is there any downsides?

Please note: the questions is "why", not "how". The explanation above is just an example where tuples works better than lists and maps.

Community
  • 1
  • 1
raacer
  • 5,302
  • 3
  • 27
  • 46
  • 1
    See this post- http://stackoverflow.com/a/31193180/5796559 – KA01 Dec 26 '16 at 17:06
  • 1
    I don't agree with "don't use tuples for a list of similar elements". Do you have links to someone saying this? Maybe the context was different? I have used tuples in the past where I wanted very fast random access into a list by index (identical to your use case) and it was a huge speedup for my use case as indexing became O(1) instead of O(n). – Dogbert Dec 26 '16 at 17:08
  • @KeithA That post does not answer my question. – raacer Dec 26 '16 at 18:07
  • @Dogbert The link that KeithA posted above contains such examples where people say "tuples aren't meant to be iterated over". And otherwise why they are not enumerable? Btw, you can also achieve a huge speedup by replacing lists with maps. – raacer Dec 26 '16 at 18:12
  • @Dogbert One could use tuples for similar elements but it does strike me as a hack (and not the good kind of hack either). If you need something like that, just use the Erlang array module (linked below). – Onorio Catenacci Dec 27 '16 at 02:33
  • The whole question sounds to me as **a premature optimization**. I am a strong believer in that list/map/tuple/array choice here won’t be a bottleneck under any circumstances. – Aleksei Matiushkin Dec 27 '16 at 06:27
  • @raacer When I used tuples, maps were way slower than they are now (Erlang 17 IIRC). – Dogbert Dec 27 '16 at 06:42
  • @mudasobwa "under any circumstances"? Try measuring the time to look up the 1000th element in a list vs tuple. – Dogbert Dec 27 '16 at 09:29
  • @OnorioCatenacci As far as I can see, Erlang `:array` is built on top of tuples. There's no need to use arrays if you don't want the extra features like being able to modify the contents after creation. Array lookups by index will also necessarily be slower than tuple lookup. – Dogbert Dec 27 '16 at 09:31
  • @mudasobwa No, it is not premature. – raacer Dec 27 '16 at 10:20
  • @raacer how long does it take your program to process with lists or arrays? – Ninigi Dec 27 '16 at 10:26
  • 1
    @Ninigi I do random access on about 100_000 elements. Lists of such length are 3700x slower than tuples on random access. So I even do not try to use lists. I've tried maps which are 5x slower on my task. I have not tried :array still, but it is also not enumerable in Elixir, so it does not make sense for code readability to replace tuple with :array. – raacer Dec 27 '16 at 10:51
  • The point is, you are (rightfully) stating that x and y is slower. @mudasobwa said it smells like premature optimization, which would be the case, if you want to go for the fastest possibility although another would still be processed in an acceptable timeframe. It sounds like you are doing exactly that, since your only argument so far is "No, it's not premature", instead of "5 seconds processing time is not acceptable" or something like that :) – Ninigi Dec 27 '16 at 10:56
  • @Ninigi Ok, I'll say for you and mudasobwa: "0.5 seconds processing time is not acceptable". I need at least 0.1. And this is true, beliave me please. Does this sound well for you? But again, I'm not asking how and when to optimize my application. I just explained why I want to use tuples. – raacer Dec 27 '16 at 11:02
  • You explained you wanted to use tuples because they are faster, someone told you it smells like premature optimization, you said "no it's not, it's faster" which doesn't make sense. Since no one knows if you really know what you're doing, they tried to give the best advice instead of just giving braindead answers. I'm just trying to say, it would be better to give a reason, as you did by saying you need 0.1 or less seconds processing time :) – Ninigi Dec 27 '16 at 11:06
  • So if you need that level of performance look at Erlang's NIF. – Onorio Catenacci Dec 27 '16 at 11:06
  • @OnorioCatenacci Tuple works well for me, I just want to know why it is not enumerable and why it is not meant to be used as array of similar elements. – raacer Dec 27 '16 at 11:12
  • @OnorioCatenacci If everybody on StackOverflow knows how to use tuples in Elixir, I can hope somebody knows why. So I beleave this is the question to Stack Overflow users :) Also, core committers are Stack Overflow users too. So you should not close the question just because you don't have an answer. Please let another people a chance to give a good answer. – raacer Dec 27 '16 at 11:22
  • @OnorioCatenacci Also please note: the second part of question is not about the language itself, but about the well known suggestion to not use tuples for similar elements. – raacer Dec 27 '16 at 11:26

2 Answers2

9

1. The reason not to implement Enumerable for Tuple

From the retired Elixir talk mailing list:

If there is a protocol implementation for tuple it would conflict with all records. Given that custom instances for a protocol virtually always are defined for records adding a tuple would make the whole Enumerable protocol rather useless.

-- Peter Minten

I wanted tuples to be enumerable at first, and even eventually implemented Enumerable on them, which did not work out.

-- Chris Keele

How does this break the protocol? I'll try to put things together and explain the problem from the technical point of view.

Tuples. What's interesting about tuples is that they are mostly used for a kind of duck typing using pattern matching. You are not required to create new module for new struct every time you want some new simple type. Instead of this you create a tuple - a kind of object of virtual type. Atoms are often used as first elements as type names, for example {:ok, result} and {:error, description}. This is how tuples are used almost anywhere in Elixir, because this is their purpose by design. They are also used as a basis for "records" that comes from Erlang. Elixir has structs for this purpose, but it also provides module Record for compatibility with Erlang. So in most cases tuples represent single structures of heterogenous data which are not meant to be enumerated. Tuples should be considered like instances of various virtual types. There is even @type directive that allows to define custom types based on tuples. But remember they are virtual, and is_tuple/1 still returns true for all those tuples.

Protocols. On the other hand, protocols in Elixir is a kind of type classes which provide ad hoc polymorphism. For those who come from OOP this is something similar to superclasses and multiple inheritance. One important thing that protocol is doing for you is automatic type checking. When you pass some data to a protocol function, it checks that the data belongs to this class, i.e. that protocol is implemented for this data type. If not then you'll get error like this:

** (Protocol.UndefinedError) protocol Enumerable not implemented for {}

This way Elixir saves your code from stupid mistakes and complex errors unless you make wrong architectural decisions

Altogether. Now imagine we implement Enumerable for Tuple. What it does is making all tuples enumerable while 99.9% of tuples in Elixir are not intended to be so. All the checks are broken. The tragedy is the same as if all animals in the world begin quacking. If a tuple is passed to Enum or Stream module accidentally then you will not see useful error message. Instead of this your code will produce unexpected results, unpredictable behaviour and possibly data corruption.

2. The reason not to use tuples as collections

Good robust Elixir code should contain typespecs that help developers to understand the code, and give Dialyzer ability to check the code for you. Imagine you want a collection of similar elements. The typespec for lists and maps may look like this:

@type list_of_type :: [type]
@type map_of_type :: %{optional(key_type) => value_type}

But you can't write same typespec for tuple, because {type} means "a tuple of single element of type type". You can write typespec for a tuple of predefined length like {type, type, type} or for a tuple of any elements like tuple(), but there is no way to write a typespec for a tuple of similar elements just by design. So choosing tuples to store your collection of elemenets means you lose such a nice ability to make your code robust.

Conclusion

The rule not to use tuples as lists of similar elements is a rule of thumb that explains how to choose right type in Elixir in most cases. Violation of this rule may be considered as possible signal of bad design choice. When people say "tuples are not intended for collections by design" this means not just "you do something unusual", but "you can break the Elixir features by doing wrong design in your application".

If you really want to use tuple as a collection for some reason and you are sure you know what you do, then it is a good idea to wrap it into some struct. You can implement Enumerable protocol for your struct without risk to break all things around tuples. It worth to note that Erlang uses tuples as collections for internal representation of array, gb_trees, gb_sets, etc.

iex(1)> :array.from_list ['a', 'b', 'c']
{:array, 3, 10, :undefined,
 {'a', 'b', 'c', :undefined, :undefined, :undefined, :undefined, :undefined,
  :undefined, :undefined}}

Not sure if there is any other technical reason not to use tuples as collections. If somebody can provide another good explanation for the conflict between the Record and the Enumerable protocol, he is welcome to improve this answer.

raacer
  • 5,302
  • 3
  • 27
  • 46
1

As you are sure you need to use tuples there, you might achieve the requested functionality at a cost of compilation time. The solution below will be compiling for long (consider ≈100s for @max_items 1000.) Once compiled the execution time would gladden you. The same approach is used in Elixir core to build up-to-date UTF-8 string matchers.

defmodule Tuple.Enumerable do
  defimpl Enumerable, for: Tuple do
    @max_items 1000

    def count(tuple), do: tuple_size(tuple)

    def member?(_, _), do: false # for the sake of compiling time

    def reduce(tuple, acc, fun), do: do_reduce(tuple, acc, fun)

    defp do_reduce(_,       {:halt, acc}, _fun),   do: {:halted, acc}
    defp do_reduce(tuple,   {:suspend, acc}, fun)  do
      {:suspended, acc, &do_reduce(tuple, &1, fun)}
    end
    defp do_reduce({},      {:cont, acc}, _fun),   do: {:done, acc}
    defp do_reduce({value}, {:cont, acc}, fun)     do
      do_reduce({}, fun.(value, acc), fun)
    end

    Enum.each(1..@max_items-1, fn tot ->
      tail = Enum.join(Enum.map(1..tot, & "e_★_#{&1}"), ",")
      match = Enum.join(["value"] ++ [tail], ",")
      Code.eval_string(
        "defp do_reduce({#{match}}, {:cont, acc}, fun) do
           do_reduce({#{tail}}, fun.(value, acc), fun)
         end", [], __ENV__
      )
    end)

    defp do_reduce(huge,    {:cont, _}, _) do
      raise Protocol.UndefinedError, 
            description: "too huge #{tuple_size(huge)} > #{@max_items}",
            protocol: Enumerable,
            value: Tuple
    end
  end
end

Enum.each({:a, :b, :c}, fn e ->  IO.puts "Iterating: #{e}" end)
#⇒ Iterating: a
#  Iterating: b
#  Iterating: c

The code above explicitly avoids the implementation of member?, since it would take even more time to compile while you have requested the iteration only.

Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
  • @OnorioCatenacci frankly, this question made me to implement the above. For `@max_items` set to, say, 30, it makes much sense and I probably would use this code in my projects. The only thing is needed to make this code robust is to gracefully fall back to `Tuple.to_list` for huge tuples. It might be particularly useful for iterating rest responses. So I would not close this question, it has a value IMHO. – Aleksei Matiushkin Dec 27 '16 at 12:11
  • I'd close it because I think he's effectively asking "why aren't tuples enumerable?" He's not asking for a workaround--several viable workarounds have been offered. He's asking why the language was designed the way it was--which is a valid question but it's not within the scope of questions appropriate to ask on Stack Overflow. I still think his question should be closed. – Onorio Catenacci Dec 27 '16 at 12:50
  • I'm afraid this will compile too long time for @max_items=100_100 and more. And this is probably an example of bad practice (please see my comment on your another answer). – raacer Dec 28 '16 at 01:29