Sort an array of arrays based on the order in another array

Question

I have an array of arrays:

x = [
  ["ready", 5], ["shipped", 1], ["pending", 1], ["refunded", 1],
  ["delivered", 23], ["scheduled", 1], ["canceled", 51]
]

My sorting array is

order_array = [
  "ready", "in_progress", "recieved", "shipped", "scheduled", "pick_up",
 "delivered", "canceled", "failed", "refunded", "refund_failed"
]

I need to order x based on the value of the first element in each subarray. The required sorted array is:

[
  ["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23],
  ["canceled", 51], ["refunded", 1]
]

Using sort_by doesn't result in the required sorting, it leads to the same array.

result = x.sort_by {|u| order_array.index(u)}
# => [
#      ["ready", 5], ["shipped", 1], ["pending", 1], ["refunded", 1],
#      ["delivered", 23], ["scheduled", 1], ["canceled", 51]
# ]

What about `["pending", 1]` – should it be removed because `"pending"` is not an element of `order_array`? — Stefan, Mar 13 '19 at 11:58
RE `order_array[2]`: chant, " 'I' before 'E' except after 'C' or when sounding like 'A' in 'neighbor' or 'weigh' ". (Exceptions exist.) — Cary Swoveland, Mar 13 '19 at 19:20

sawa · Accepted Answer · 2019-03-13T12:20:47.530

5

h = x.to_h
# => {"ready"=>5,
# "shipped"=>1,
# "pending"=>1,
# "refunded"=>1,
# "delivered"=>23,
# "scheduled"=>1,
# "canceled"=>51}

order_array.map{|key| [key, h[key]] if h.key?(key)}.compact
# => [["ready", 5],
# ["shipped", 1],
# ["scheduled", 1],
# ["delivered", 23],
# ["canceled", 51],
# ["refunded", 1]]

or

h = x.to_h{|k, v| [k, [k, v]]}
#=> {"ready"=>["ready", 5],
# "shipped"=>["shipped", 1],
# "pending"=>["pending", 1],
# "refunded"=>["refunded", 1],
# "delivered"=>["delivered", 23],
# "scheduled"=>["scheduled", 1],
# "canceled"=>["canceled", 51]}

order_array.map{|k| h[k]}.compact
#=> [["ready", 5],
# ["shipped", 1],
# ["scheduled", 1],
# ["delivered", 23],
# ["canceled", 51],
# ["refunded", 1]]

or

h = x.to_h{|k, v| [k, [k, v]]}
#=> {"ready"=>["ready", 5],
# "shipped"=>["shipped", 1],
# "pending"=>["pending", 1],
# "refunded"=>["refunded", 1],
# "delivered"=>["delivered", 23],
# "scheduled"=>["scheduled", 1],
# "canceled"=>["canceled", 51]}

h.values_at(*order_array).compact
#=> [["ready", 5],
# ["shipped", 1],
# ["scheduled", 1],
# ["delivered", 23],
# ["canceled", 51],
# ["refunded", 1]]

edited Mar 13 '19 at 12:20

answered Mar 13 '19 at 11:37

sawa

165,429
45
277
381

Thanks this works but @SRack solution is much simple. – Selim Alawwa Mar 13 '19 at 11:42
@Stefan Right. I implicitly assumed so. – sawa Mar 13 '19 at 12:04
1

these is no duplicates. second solution is also perfect! – Selim Alawwa Mar 13 '19 at 12:15
2

Nice, I didn't know that 2.6 added a block variant for `to_h`. That voids my above comment. – Stefan Mar 13 '19 at 12:24
2

@Stefan It was according to my request. – sawa Mar 13 '19 at 12:29

SRack · Answer 2 · 2019-03-13T18:04:36.623

4

You're almost there with this: index isn't working as you're comparing the full array, rather than the first element of it. This will work:

result = x.sort_by { |u| order_array.index(u[0]) || 100 }
#=> [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1], ["pending", 1]]

Please note, the 100 is there to default to the back of the sort if the value isn't found in order_array.

Edit

This was initially accepted, despite including ["pending", 1] suggesting it fit the requirements; however, here's a solution to avoid the unwanted entry, which also handles duplicates should the need arise.

order_array.each_with_object([]) { |ordered_by, array| array.push(*x.select { |item| item[0] == ordered_by }) }
#=> [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1]]

Or, very fast though still allowing for duplicate values under each ordered item:

hash = x.each_with_object(Hash.new { |h,k| h[k] = [] }) { |item, h| h[item[0]] << item[1] }
order_array.flat_map { |key| [key, hash[key]] }

Benchmark

Here's a benchmark for this scenario with a larger dataset: https://repl.it/repls/SentimentalAdequateClick. Looks like Sawa's methods lead the way, though my last effort works handily should there be duplicate values in future. Also, my second effort sucks (which surprised me a little) :)

edited Mar 13 '19 at 18:04

answered Mar 13 '19 at 11:37

SRack

11,495
5
47
60

That does not give what the OP wanted. – sawa Mar 13 '19 at 11:38
Can see it's including `pending`, which I hadn't noticed was missing above @sawa. Your approach makes more sense if this is essential. Cheers for the comment. – SRack Mar 13 '19 at 11:42
What is the output that you really wanted? – sawa Mar 13 '19 at 11:44
I think this fits with the OP's question? The output is added to my answer: it matches the required output, albeit with `["pending", 1]` bumped to the back of the array. – SRack Mar 13 '19 at 11:47
As you have noted, `["pending", 1]` is added at the end, which the OP has not wanted (according to the question). – sawa Mar 13 '19 at 11:48
Yep, agree - I've upvoted yours, and mentioned above I prefer your approach due to that. Overlooked the mistake ahead of your comment. This seems to fit Selim's requirements, although yours is definitely the correct one as per the question. – SRack Mar 13 '19 at 11:50
Edited to include a working solution, which I think is pretty effective @SelimAlawwa. – SRack Mar 13 '19 at 13:17
Very good! Maybe you want to add the link to the benchmark directly in your answer. – iGian Mar 13 '19 at 18:00
@SRack, I noticed that the benchmark is on Ruby 2.5 environment, so, the last two methods from Sawa don't return the expected result. Tested on 2.6 are still a lot faster. I edited my post with a further option. – iGian Mar 13 '19 at 19:38
Thanks @iGian - I'd tested the same on my machine with better results, though hadn't realised Repl.it was on 2.5. Appreciate the clarification. – SRack Mar 14 '19 at 09:33

score 4 · Answer 3 · answered Mar 13 '19 at 15:40

4

assoc seems helpful: "Searches through an array whose elements are also arrays comparing obj with the first element of each contained array using obj.==."

order_array.map{|e| x.assoc(e) }.compact

answered Mar 13 '19 at 15:40

steenslag

79,051
16
138
171

Never seen nor heard of `assoc` before - +1 for bringing it into my life :) Great answer. – SRack Mar 13 '19 at 15:51
You know that hash is faster :) https://stackoverflow.com/a/5552062/5239030 – iGian Mar 13 '19 at 18:02

iGian · Answer 4 · 2019-03-13T19:36:16.043

2

I'd suggest

x.keep_if { |e| order_array.include? e[0] }.sort_by { |e| order_array.index(e[0]) }

Since some values are not elements of order_array, for example "pending".

#=> [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1]]

Benchmarked the answers up to now 500.times:

#        user       system     total       real
# sawa   0.006698   0.000132   0.006830 (  0.006996) # on the first method
# ray    0.005543   0.000123   0.005666 (  0.005770)
# igian  0.001923   0.000003   0.001926 (  0.001927)
# srack  0.005270   0.000168   0.005438 (  0.005540) # on the last method

Just for fun I tried to find a faster method for Ruby 2.5:

xx = x.to_h # less than Ruby 2.6
order_array.each.with_object([]) { |k, res| res << [k, xx[k]] if xx.has_key? k }

edited Mar 13 '19 at 19:36

answered Mar 13 '19 at 11:55

iGian

11,023
3
21
36

This was my thought as an edit for mine when I realised my mistake, though it's more complex that @sawa's answer, so didn't think it worthwhile. Still pretty readable though. – SRack Mar 13 '19 at 11:58
I found it looping 2 times (increasing complexity) and also `index` is quite slow in case of large array. – ray Mar 13 '19 at 12:26
1

@ray, actually the benchmark surprised me. Maybe you can double check? I considered the array to be small in for te OP case. – iGian Mar 13 '19 at 15:47
1

@iGian credit where it's due - this does outperform both Ray's and my second answer, even as it scales. I've benchmarked with a larger dataset ([here](https://repl.it/repls/SentimentalAdequateClick)) and it's basically possible to conclude Sawa is king :) – SRack Mar 13 '19 at 17:15

score 1 · Answer 5 · answered Mar 13 '19 at 12:23

1

You can try below code to find output efficiently,

order_array.map { |p| x.detect { |y| y[0] == p } }.compact
# => [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1]]

answered Mar 13 '19 at 12:23

ray

5,454
1
18
40

Cary Swoveland · Answer 6 · 2019-03-13T22:11:59.127

I've assumed:

the first element of each element of x is not necessarily unique;
all elements of x whose first element is the same and whose first element is a member of order_array appear consecutively in the returned (sorted) array in the order in which those elements appear in x;
any elements of x whose first element is not a member of order_array appears in the returned (sorted) array after all elements whose first element is in sorted_array, and all such elements appear in the returned array (at the end) in the order in which they occur in x; and
efficiency is paramount.

x = [
  ["ready", 5], ["shipped", 1], ["pending", 1], ["refunded", 1], ["originated", 3],
  ["delivered", 23], ["scheduled", 1], ["ready", 8], ["canceled", 51]
]

order_array = [
  "ready", "in_progress", "received", "shipped", "scheduled", "pick_up",
  "delivered", "canceled", "failed", "refunded", "refund_failed"
]

order_pos = order_array.each_with_object({}) { |word,h| h[word] = [] }
  #=> {"ready"=>[], "in_progress"=>[], "received"=>[], "shipped"=>[],
  #    "scheduled"=>[], "pick_up"=>[], "delivered"=>[], "canceled"=>[],
  #    "failed"=>[], "refunded"=>[], "refund_failed"=>[]} 
back = x.each_with_index.with_object([]) { |((word,v),i),back|
  order_pos.key?(word) ? (order_pos[word] << i) : back << [word,v] }
  #=> [["pending", 1], ["originated", 3]] 
order_pos.flat_map { |word,offsets| offsets.map { |i| x[i] } }.concat(back)
  #=> [["ready", 5], ["ready", 8], ["shipped", 1], ["scheduled", 1],
  #    ["delivered", 23], ["canceled", 51], ["refunded", 1], ["pending", 1],
  #    ["originated", 3]]

Note:

order_pos
  #=> {"ready"=>[0, 7], "in_progress"=>[], "received"=>[], "shipped"=>[1],
  #    "scheduled"=>[6], "pick_up"==>[], "delivered"=>[5], "canceled"=>[8],
  #    "failed"=>[], "refunded"=>[3], "refund_failed"=>[]}

It is necessary to initialise order_pos in order for its keys to be ordered by order_arr. This is an example of the worth of a controversial change made in Ruby 1.9 which guaranteed that hash keys will remain in key-insertion order.

Sort an array of arrays based on the order in another array

6 Answers6