best way to sort an array of objects by category and infinite subcategory

Question

I have a databases that I will be pulling the array of objects below. I want to create a tree structure from it.

When the parent_id is nil then its a top level Category. If the parent_id is not nil then it a subcategory of the id value of parent_id.

The best solution I have come up with was to loop through the set to get the top level categories then continue looping through until I have organized. Ultimately the table will be less than 500 records but there is no guarantee of that. So looping over and over seems really stupid. However, I can't think of another way to do it. Below is a sample dataset and the way it would be organized.

[{id: 1, name: "top test 1", parent_id: nil},
 {id: 2, name: "test 2", parent_id: 1},
 {id: 3, name: "test 3", parent_id: 1},
 {id: 4, name: "top test 4", parent_id: nil},
 {id: 5, name: "test 5", parent_id: 3},
 {id: 6, name: "test 6", parent_id: 4},
 {id: 7, name: "test 7", parent_id: 4}]


top test 1
  test 2
  test 3
    test 5
top test 2
  test 6
  test 7

Actual array of objects returned from the db. Still just test data.

[#<ItemsCategory id: 2, name: "test 2", parent_id: 1, created_at: "2014-03-04 17:58:46", updated_at: "2014-03-04 17:58:46">, 
#<ItemsCategory id: 3, name: "test 3", parent_id: 1, created_at: "2014-03-04 17:23:23", updated_at: "2014-03-04 17:23:23">, 
#<ItemsCategory id: 5, name: "test 4", parent_id: 3, created_at: "2014-03-06 17:48:25", updated_at: "2014-03-06 17:48:25">, 
#<ItemsCategory id: 1, name: "NEW test EDITED", parent_id: nil, created_at: "2014-03-04 17:57:21", updated_at: "2014-03-10 20:50:10">]

Possible duplicate of http://stackoverflow.com/questions/11741825/build-tree-from-edges — Judge Mental, Mar 11 '14 at 06:03
possibly, but in that example it looks like the subdirectories go one level deep (I could have infinite subdirectories) which I think removes a certain amount of complexity from it. I guess the question I am looking to get answered is: must I iterate over the set multiple time in order to build the tree or is there a really clever way that I am not seeing. — bonum_cete, Mar 11 '14 at 15:14
Your problem is exactly to build a tree from a list of edges, and the answer given there solves the general case. — Judge Mental, Mar 11 '14 at 17:15

Cary Swoveland · Accepted Answer · 2014-03-15T05:37:24.410

You can do it like this:

Code

def doit(data, indent = 2)
  d = data.each_with_object({}) { |h,g| g[h[:id]] = h }
  d.each {|_,h| h[:ancestor_ids] =
    (h[:top_level_category_id] ? d[h[:parent_id]][:ancestor_ids] :[])+[h[:id]]}
   .values
   .sort_by { |h| h[:ancestor_ids] }
   .each { |h| puts ' '*((h[:ancestor_ids].size-1)*indent) + "#{h[:name]}" }
end

Demo

data=[
  {id: 1, name: "parent test 1", parent_id: nil, top_level_category_id: nil},
  {id: 2, name: "test 2", parent_id: 1, top_level_category_id: 1},
  {id: 3, name: "test 3", parent_id: 1, top_level_category_id: 1},
  {id: 4, name: "parent test 4", parent_id: nil, top_level_category_id: nil},
  {id: 5, name: "test 5", parent_id: 3, top_level_category_id: 4},
  {id: 6, name: "test 6", parent_id: 4, top_level_category_id: 4},
  {id: 7, name: "test 7", parent_id: 4, top_level_category_id: 4}
]

doit(data)
parent test 1
  test 2
  test 3
    test 5
parent test 4
  test 6
  test 7

Explanation

What we need to do is add another hash element (whose key I've named :ancestor_ids), whose value is an array of the hash's :id and those of all of its ancestors; i.e., we want to add the following elements to the respective hashes:

:ancestor_ids => [1]
:ancestor_ids => [1,2]
:ancestor_ids => [1,3]
:ancestor_ids => [4]
:ancestor_ids => [1,3,5]
:ancestor_ids => [4,6]
:ancestor_ids => [4,7]

Once we have these, we can use sort_by { |h| h[:ancestor_ids] } to put the elements of the array data in the proper order. (If you are uncertain how the elements of an array are ordered, review Array#<=>.) Also h[:ancestor_ids].size is used to determine the amount of indentation required when displaying the results.

The calculations go like this*:

d = data.each_with_object({}) { |h,g| g[h[:id]] = h }
  #=> {1=>{:id=>1, :name=>"parent test 1",...},
  #    2=>{:id=>2, :name=>"test 2",...},
  #    3=>{:id=>3, :name=>"test 3",...},
  #    4=>{:id=>4, :name=>"parent test 4",...},
  #    5=>{:id=>5, :name=>"test 5",...},
  #    6=>{:id=>6, :name=>"test 6",...},
  #    7=>{:id=>7, :name=>"test 7",...}}

We perform this step to make it easy to find the rows of data that correspond to a record's parent.

e = d.each {|_,h| h[:ancestor_ids] =
    (h[:top_level_category_id] ? d[h[:parent_id]][:ancestor_ids]:[])+[h[:id]]}
  #=> {1=>{:id=>1,...,:ancestor_ids=>[1]},
  #    2=>{:id=>2,...,:ancestor_ids=>[1, 2]},
  #    3=>{:id=>3,...,:ancestor_ids=>[1, 3]},
  #    4=>{:id=>4,...,:ancestor_ids=>[4]}
  #    5=>{:id=>5,...,:ancestor_ids=>[1, 3, 5]},
  #    6=>{:id=>6,...,:ancestor_ids=>[4, 6]},
  #    7=>{:id=>7,...,:ancestor_ids=>[4, 7]}}

This adds the element whose key is :ancestor_ids. We no longer need the keys, so we will extract the values, sort them by :ancestor_ids and display the results:

f = e.values
  #=> [{:id=>1,...,:ancestor_ids=>[1]},
  #    {:id=>2,...,:ancestor_ids=>[1, 2]},
  #    {:id=>3,...,:ancestor_ids=>[1, 3]},
  #    {:id=>4,...,:ancestor_ids=>[4]}
  #    {:id=>5,...,:ancestor_ids=>[1, 3, 5]},
  #    {:id=>6,...,:ancestor_ids=>[4, 6]},
  #    {:id=>7,...,:ancestor_ids=>[4, 7]}}

g = f.sort_by { |h| h[:ancestor_ids] }
  #=> [{:id=>1,...,:ancestor_ids=>[1]},
  #    {:id=>2,...,:ancestor_ids=>[1, 2]},
  #    {:id=>3,...,:ancestor_ids=>[1, 3]},
  #    {:id=>5,...,:ancestor_ids=>[1, 3, 5]},
  #    {:id=>4,...,:ancestor_ids=>[4]}
  #    {:id=>6,...,:ancestor_ids=>[4, 6]},
  #    {:id=>7,...,:ancestor_ids=>[4, 7]}}

indent = 2
g.each { |h| puts ' '*((h[:ancestor_ids].size-1)*indent) + "#{h[:name]}" }
parent test 1
  test 2
  test 3
    test 5
parent test 4
  test 6
  test 7

Points

Do you need the hash element whose key is :top_level_category_id, considering that :parent_id => nil for top level elements?
Production code would raise an exception if, in the calculation of e above, there were no element of d with key h[:parent_id] or the value h[:parent_id] had no key :ancestor_ids.
This answer relies on the assumption that, for each element h of Data that is not top level, h[:id] > h[:parent_id] when h[:parent_id] is not nil. If the rows of Data are not initially ordered by :id, they must be sort_by'ed :id as a first step.

* If you try running this at home, it should work from the command line, but IRB and PRY cannot handle the continued lines that begin with a dot

I've marked your answer. I am trying to basically create the tree now with

tags at this point. Wrapping the
is easy enough. Any tips on how I can wrap the parents and their children in
tags? Thanks! — bonum_cete, Mar 14 '14 at 18:40
@isea Please look into [CGI](http://www.ruby-doc.org/stdlib-2.1.0/libdoc/cgi/rdoc/CGI.html) to build HTML the html tree. If you want to generate `XML`, then you might look into [`Nokogiri`](http://nokogiri.org/). Or [`YAML`](http://ruby-doc.org/stdlib-2.0.0/libdoc/yaml/rdoc/YAML.html) — Arup Rakshit, Mar 15 '14 at 07:48
@isea, sorry I can't help with the HTML/XML. I know very little about that. — Cary Swoveland, Mar 16 '14 at 00:46
@CarySwoveland That's ok. One more question though. I'm having trouble getting some results with the actual array of objects. For some reason all children are level 1. I have copied the actual array of objects above. Thanks again! — bonum_cete, Mar 17 '14 at 21:37
Nevermind, I found it. all the top_level_category fields were nil — bonum_cete, Mar 17 '14 at 22:13

Judge Mental · Answer 2 · 2014-03-12T19:07:38.203

0

Requires a single pass through the edge list. All nodes must fit in memory together; edge list must constitute an actual tree (that is, there's no checking for forests, proper DAGs, or cycles).

private static final Long DUMMY = null;
Node buildTree( Iterable< ? extends Edge > iedg ) {
    Map< Long, Node > mnod = new HashMap< Long, Node >();

    for ( Edge edg : iedg )
        getNode( mnod, iedg.getParentId() ).addChild(
            getNode( mnod, iedg.getId() ).withName( iedg.getName() )
        );

    return getNode( mnod, DUMMY ).firstChild();
}

private Node getNode( Map< Long, Node > mnod, Long lId ) {
    Node nod = mnod.get( lId );
    if ( null == nod )
        mnod.put( lId, nod = new Node().withId( lId ) );
    return nod;
}

edited Mar 12 '14 at 19:07

answered Mar 11 '14 at 17:35

Judge Mental

5,209
17
22

Thanks for this. I'm going to try to translate to Ruby this morning. Can you explain this portion Iterable< Edge > iedg – bonum_cete Mar 12 '14 at 15:31
I just picked the most general argument type I could. The `iedg` parameter is only used in a `for`-loop, so it has to have at least type `Iterable< ? extends Edge >`. – Judge Mental Mar 12 '14 at 19:09
This is not a Ruby code. How you think about to post it in a post tagged with Ruby. – Arup Rakshit Mar 15 '14 at 07:54

best way to sort an array of objects by category and infinite subcategory

2 Answers2

Linked