divide function for set in Ruby 2.3.1

Question

The below is from Ruby 2.3.1 documentation for dividing a set into a set of its subsets based on certain criteria applied to each pair of elements in the original set. Basically, if two numbers in the set are within 1 unit of each other, they fall in the same subset in the set of subsets of the original set.

    require 'set'
numbers = Set[1, 3, 4, 6, 9, 10, 11]
set = numbers.divide { |i,j| (i - j).abs == 1 }
p set     # => #<Set: {#<Set: {1}>, 
          #            #<Set: {11, 9, 10}>,
          #            #<Set: {3, 4}>,

I think the problem on which I am working could use this function. This is the problem. There is a set S of n things together with a proximity function on pairs of things from S that has positive values for some of the pairs in this set. For pairs whose values for the proximity function is not specified, it may be assumed that these values are 0. There is also a threshold parameter. The objective of the program is to induce a partition (a set of pair-wise disjoint and mutually exhaustive subsets of the set) on the set S such that two things fall into the same subset if their proximity function value exceeds the threshold parameter (converse need not be true).

The input to this program is of this form

t<-threshold parameter (a float greater than 0)

n<- number of lines to follow (integer)

Thing_i_1 Thing_j_1 proximity_ij_1 (Thing_i_1 and Thing_j_1 are integers and proximity_ij_1 is float and is greater than 0)

.... .... Thing_i_n Thing_j_n proximity_ij_n

The output is the aforementioned set of pairwise disjoint and mutually exhaustive subsets of the original set such that two things with the proximity function value at least equal to the threshold parameter fall into the same subset.

I wrote the program below to accomplish this, but it fails to form subsets of the set in question. My input was this

Output should be {{1,2,5},{3},{4}} because 1,2 should fall into the same subset and so should 2,5 since proximity function value in each case exceeds the threshold parameter (so 1 and 5 effectively fall into the same subset), and 3 and 4 form subsets of their own.

require 'set'
t=gets.chomp.to_f
n=gets.chomp.to_i
edge=Struct.new(:n1,:n2)
se=Array.new
af=Array.new
sv=Set.new

for i in (0..n-1)
    s=gets.chomp.split(" ")
    se.insert(-1,edge.new(s[0],s[1]))

        af.insert(-1,s[2].to_f)

    if (sv.member? s[0])==false 
        sv.add(s[0])
    end
    if (sv.member? s[1])==false 
        sv.add(s[1])
    end
end

    c=sv.divide { |i,j|  (k=se.index(edge.new(i,j)))!=nil  && af[k]>=t }
p c

Output:

#<Set: {#<Set: {"5"}>, #<Set: {"2"}>, #<Set: {"1"}>, #<Set: {"3"}>, #<Set: {"4"}
>}>

The divide function does not seem to work. Am I doing anything wrong? Why am I getting five disjoint subsets instead of the expected three? I printed out the values of the condition in the divide block and got true exactly for 1,2 and 2,5 but yet 1, 2 and 5 end up in different subsets. Can someone help? Thank you.

Where you say, "... My input was this..", I assume the first line (2.5) is the threshold value, the second line gives the number of pairs for which a proximity value is given and the last three lines are of the form `value1 value2 proximity_value`. If that is correct, it would be much easier for readers to follow the question if you just said that. — Cary Swoveland, Sep 28 '16 at 01:43
What is "(2.5)"? I wanted to be formal in order to be clear. But sometimes dispensing with formality is better. I see what you are saying. Thank you. — user17144, Sep 28 '16 at 06:31

Amadan · Accepted Answer · 2016-09-28T06:34:55.970

1

divide will only divide where both block.call(a, b) && block.call(b, a). Make your se reflexive (i.e. insert also the edges 2-1, 4-3 and 5-2) and it will work. Alternately, make your block return true if either edge.new(i,j) or edge.new(j, i) is in se. There is also an error about types: you're creating an edge from strings (edge.new(s[0],s[1]), but testing against an edge from integers (edge.new(i,j)), so the membership test will fail.

That said, this is very unRubyish code. If I were to rewrite it, it would go like this:

require 'set'

Edge = Struct.new(:v1, :v2, :p)
edges = {}
vertices = Set.new

t = gets.chomp.to_f
n = gets.chomp.to_i
n.times do
  v1, v2, p = *gets.chomp.split
  v1 = v1.to_i
  v2 = v2.to_i
  p = p.to_f
  edge = Edge.new(v1, v2, p)

  edges[[v1, v2]] = edge
  vertices << v1 << v2
end

c = vertices.divide { |v1, v2|
  (edge = edges[[v1, v2]] || edges[[v2, v1]]) && edge.p >= t
}

p c
# => #<Set: {#<Set: {1, 2, 5}>, #<Set: {3}>, #<Set: {4}>}>

Basically - use a hash so you can always find an edge quickly by its indices, use << for putting things into other things, remember that the whole point of a set is that it won't insert the same thing twice, objects are truthy so you don't have to explicitly test for != nil, never ever using for :)

edited Sep 28 '16 at 06:34

answered Sep 28 '16 at 01:26

Amadan

191,408
23
240
301

Thank you. "The Ruby way" by Hal Fulton says "If the arity is 2, it (divide) will perform calls of the form block.call(a,b) to determine whether these two items belong together." And I did get true for block.call(1,2) and block.call(2,5). Can you please cite any reference for "divide will only divide where both block.call(a, b) && block.call(b, a)"? Also, aren't the i and j in |i,j| supposed to be generic elements of sv so that i and j actually stand for s[0] and s[1]? Indeed, I got true exactly for (1,2) and (2,5). Also, in your solution edges[[v1, v2]] || edges[[v1, v2]] = edges[[v1,v2]]. – user17144 Sep 28 '16 at 06:28
Did you mean to write edges[[v1, v2]] || edges[[v2, v1]]? – user17144 Sep 28 '16 at 06:33
Derp. Yes, I did. Fixed. – Amadan Sep 28 '16 at 06:34
"Can you please cite any reference" - Looked in the source. `Set#divide` uses `TSort#each_strongly_connected_component`; "strongly connected components" are a set of nodes where there is a path from each node to every other node. For example, in [1->2, 2->3, 3->1, 3->4], nodes 1, 2, 3 are strongly connected. `Set#divides` will in your case create a graph [1->2; 2->5] - but there is no path from 5 to 2 or from 5 to 1 or from 2 to 1, thus 1, 2 and 5 are not strongly connected, thus you get five individual groups. – Amadan Sep 28 '16 at 06:46
In my example the graph `Set#divide` creates internally is [1->2, 2->1, 2->5, 5->2], so you can get from any node to any other (for example, from 5 to 1 you can get by taking 5->2, 2->1), thus [1, 2, 5] are a set of strongly connected components, and it works. Indeed, none of this is reflected in RubyDoc. – Amadan Sep 28 '16 at 06:47
Also notice that in the original example the criterion that was used was `{ |i,j| (i - j).abs == 1 }`, which is reflexive; it would be much simpler to write `{ |i,j| i - j == 1 }`, but then it wouldn't have worked. – Amadan Sep 28 '16 at 06:53
Your code worked even before "fixing" it. It worked even when you had the accidentally redundant edges[[v1, v2]] in edges[[v1, v2]] || edges[[v1, v2]]=edges[[v1, v2]]. Does this not indicate that it is enough only for block.call(a,b) to be true? Also, in your example of strongly connected component I suppose 1->4 and 2->4 need to be there. Your code may have worked, but I am a little confused since it worked only with edges[[v1,v2]]. – user17144 Sep 28 '16 at 06:55
For me, `edges[[v1, v2]] || edges[[v1, v2]]=edges[[v1, v2]]` makes the code not work; with `edges[[v1, v2]] || edges[[v1, v2]]=edges[[v2, v1]]`, it does (Ruby 2.3.1). In my example, [1, 2, 3] are a strongly connected subgraph, as they make a circle - you can reach any of them from any other (possibly by going through the third one); node 4 is not strongly connected with them, because while there is a way to get from any of [1, 2, 3] to 4 (e.g. 1->2->3->4), there is no way to get from 4 to any of them. – Amadan Sep 28 '16 at 07:00
Besides the possibility that another Ruby is doing something different than mine, another possibility is that the input file that you provided is not your real input file, and the input file you tested on actually does provide a reflexive relation (i.e. if there is a row `1 2 0.3`, there is also a row `2 1 0.3`). – Amadan Sep 28 '16 at 07:02
I think I know what the problem is. Although I got true for (1,2), I got false for (2,1). That explains it. What should I read to be able to write "Rubyic" code? – user17144 Sep 28 '16 at 18:02
This might sound flippant, but it's the only answer I know: Lots of Rubyish code :P – Amadan Sep 29 '16 at 00:31

Cary Swoveland · Answer 2 · 2016-09-30T00:59:04.997

Edit: I discovered that I answered a question that wasn't asked. I mistakenly thought that when [d,e] with proximity less than the threshold is considered, d and e are to be added to one of the (partially-built) sets if there is a "path" from d or e to one element of that set. I will leave my answer, however, as it may be of interest to anyone wanting to solve the problem I've addressed.

Here's another way, that doesn't use Set#divide.

Code

require 'set'

def setify(distances, threshold)
  sets = []
  g = distances.dup
  while (ret = g.find { |_,prox| prox >= threshold })
    (n1,n2),_ = ret
    s = [n1,n2].to_set
    g.delete [n1,n2]
    while (ret = g.find { |(n1,n2),_| s.include?(n1) || s.include?(n2) })
      pair,_ = ret
      s.merge pair   
      g.delete pair
    end
    sets << s
  end
  g.keys.flatten.each { |k| sets << [k].to_set }
  sets
end

Examples

threshold = 0.2

distances = { [1,2]=>0.3, [3,4]=>0.1, [2,5]=>0.25 }
setify(distances, threshold)
  #=> [#<Set: {1, 2, 5}>, #<Set: {3}>, #<Set: {4}>] 

distances = { [1,2]=>0.3, [3,4]=>0.1, [6,8]=>0.2, [2,5]=>0.25, [8,10]=>0 }
setify(distances, threshold)
  #=> [#<Set: {1, 2, 5}>, #<Set: {6, 8, 10}>, #<Set: {3}>, #<Set: {4}>]

Explanation

Suppose

threshold = 0.2
distances = { [1,2]=>0.3, [3,4]=>0.1, [6,8]=>0.2, [2,5]=>0.25, [8,10]=>0 }

Then

sets = []
g = distances.dup
  #=> {[1, 2]=>0.3, [3, 4]=>0.1, [6, 8]=>0.2, [2, 5]=>0.25, [8, 10]=>0}

As

ret = g.find { |_,prox| prox >= threshold }
  #=> [[1, 2], 0.3]

is truthy, we enter the (outer) while loop. We now with to construct a connected set s that includes 1 and 2.

(n1,n2),_ = ret
  #=> [[1, 2], 0.3] 
s = [n1,n2].to_set
  #=> #<Set: {1, 2}>

Since [n1,n2] has been dealt with, we must remove that key from g (hence, the need for g = distances.dup, to avoid mutating distances).

g.delete [n1,n2]
  #=> 0.3

Let's see g now.

g #=> {[3, 4]=>0.1, [6, 8]=>0.2, [2, 5]=>0.25, [8, 10]=>0}

Now look for another key, [a,b], in g (not the music key of 'g'), such that a or b (or both) are in the set s. If such a key is found, attempt to add a and b to s and delete the key [a,b] from g. (At most one of the two elements of that key will be added to the set).

ret = g.find { |(n1,n2),_| s.include?(n1) || s.include?(n2) }
  #=> [[2, 5], 0.25]

A key-value pair is found, so we enter the loop

pair,_ = ret
  #=> [[2, 5], 0.25]
pair
  #=> [2, 5] 
s.merge pair   
  #=> #<Set: {1, 2, 5}> 
g.delete pair
  #=> 0.25 
g
  #=> {[3, 4]=>0.1, [6, 8]=>0.2, [8, 10]=>0}

Now execute the while expression again.

ret = g.find { |(n1,n2),_| s.include?(n1) || s.include?(n2) }
  #=> nil

As no more keys of g are "connected" to elements of s we add s to sets,

sets << s
  #=> [#<Set: {1, 2, 5}>]

and look to continue in the outer loop.

ret = g.find { |_,prox| prox >= threshold }
  #=> [[6, 8], 0.2]

We have found the start of another set having at least one pair that meets the threshold, we we create a new set and delete the associated key of g,

(n1,n2),_ = ret
  #=> [[6, 8], 0.2]
n1 #=> 6
n2 #=> 8

s = [n1,n2].to_set
  #=> #<Set: {6, 8}> 
g.delete [n1,n2]
  #=> 0.2 
g #=> {[3, 4]=>0.1, [8, 10]=>0}

and set about building that set.

ret = g.find { |(n1,n2),_| s.include?(n1) || s.include?(n2) }
  #=> [[8, 10], 0] 
pair,_ = ret
  #=> [[8, 10], 0] 
s.merge pair   
  #=> #<Set: {6, 8, 10}> 
g.delete pair
  #=> 0 
g #=> {[3, 4]=>0.1} 

ret = g.find { |(n1,n2),_| s.include?(n1) || s.include?(n2) }
  #=> nil

so we are finished building the set s.

sets << s
  #=> [#<Set: {1, 2, 5}>, #<Set: {6, 8, 10}>]

Once again, try to enter the outer loop (i.e., see if there is another set containing a pair of elements with proximity that meets the threshold).

ret = g.find { |_,prox| prox >= threshold }
  #=> nil

Each of the elements of the keys in what's left of g must therefore comprise it's own set.

b = g.keys
  #=> [[3, 4]] 
c = b.flatten
  #=> [3, 4] 
c.each { |k| sets << [k].to_set }
  #=> [3, 4]

Return sets

sets
  #=> [#<Set: {1, 2, 5}>, #<Set: {6, 8, 10}>, #<Set: {3}>, #<Set: {4}>]

Beautiful. I sensed an intuitive need for _ when I was writing the code before I chanced upon divide. Now I know why they call Ruby zen-like. Thank you very much. I am still stuck in C/C++. How do I get into the Ruby mode? — user17144, Sep 28 '16 at 18:36
I had not been aware of [Set#divide](http://ruby-doc.org/stdlib-2.3.0/libdoc/set/rdoc/Set.html#method-i-divide), and am glad to have made the acquaintance. As to how to think in Ruby, aside from recommendations you'll find for books, blogs and other internet resources, keep posting questions here, and (especially) at SO's sister-site [Code Review](http://codereview.stackexchange.com/questions/tagged/ruby). Code Review is for working code that you'd like improve; SO is for fixing broken code and getting solutions to coding problems. — Cary Swoveland, Sep 28 '16 at 19:15
Thank you. That helps. I think "divide" internally would do what you did above. — user17144, Sep 28 '16 at 19:39

divide function for set in Ruby 2.3.1

2 Answers2