3

I'm new to functional/immutable programming and I've hit a wall. I'm trying to implement a very simply deduplication function in Elixir to deduplicate stdin.

I have a very trivial implementation using Stream.transform/2 but my original implementation used Stream.filter/2 like this (this is dumbed down for example purposes) and I'm not sure I understand why it's not working:

hashes = HashSet.new

IO.stream(:stdio, :line)
  |> Stream.filter(fn(line) ->
      if HashSet.member?(hashes, line) do
        false
      else
        hashes = HashSet.put(hashes, line)
        true
      end
    end)
  |> Enum.each(&IO.write(&1))

The idea is clearly that there's a Set containing the lines read in, and it's updated on each loop.

Now, some debugging led me to the fact that hashes inside the filter callback is empty on each loop, so I guess it's not changing the outer variable? I believe I just want to rebind the variable on the outside, rather than the one inside the filter function. Is this possible?

I'm thinking that I'm hitting a scoping issue as demonstrated by this JavaScript (it's the only comparison I can think of):

var hashes = new Set();

arr.filter(function (element) {
    var hashes = something(element); // i.e. using var not using outer scope
});

Can anybody clarify exactly what is incorrect with the above implementation? Thanks in advance :)

whitfin
  • 4,539
  • 6
  • 39
  • 67

2 Answers2

8

From https://elixir-lang.readthedocs.org/en/latest/technical/scoping.html#function-clause-scope:

Each function clause defines a new lexical scope: any new variable bound inside it will not be available outside of that clause

Because of how immutability and variables are implemented in Elixir, assigning to hashes inside your inner function is the same as binding to a new variable every time.

manukall
  • 1,442
  • 12
  • 5
  • So there's no way to rebind something in an outer scope? I was hoping there might be a way to do so manually. – whitfin Dec 09 '15 at 17:03
  • 2
    i think not. however, it's also just not idiomatic. you don't usually call functions for side effects, but for their return value. usually you use some function from Enum or Stream then. `Stream.uniq/2` would work here and more generally, `Enum.reduce/3` can do pretty much anything. – manukall Dec 09 '15 at 19:54
  • The reason for this is that there are parallel implementations of Stream.filter but none for Stream.uniq ;) thanks for the info though. – whitfin Dec 09 '15 at 21:20
  • As a reference, see José's answer on this one: http://stackoverflow.com/questions/29924170/elixir-looping-through-and-adding-to-map – Tilo Aug 09 '16 at 16:43
0

Why not go with Stream.uniq/2 (uniq/2) to filter out all duplicate lines

IO.stream(:stdio, :line)
|> Stream.uniq
|> Enum.each(&IO.write(&1))
0x0me
  • 754
  • 6
  • 16
  • This is just an example - what if I wanted to parse and unique based on some value on each line? – whitfin Dec 09 '15 at 00:44
  • According to the documentation of ``uniq/2`` you can pass a function as an second argument to provide a criteria for uniqueness, eq ``IO.stream(:stdio, :line) |> Stream.uniq(fn(x) -> String.length(x) end) |> Enum.each(&IO.write(&1))`` – 0x0me Dec 09 '15 at 08:04
  • Well, that's awesome. I had thought that might be the case but it had never seemed to work for me. Maybe I need to update. – whitfin Dec 09 '15 at 17:03