0

I have an array of arrays:

x = [
  ["ready", 5], ["shipped", 1], ["pending", 1], ["refunded", 1],
  ["delivered", 23], ["scheduled", 1], ["canceled", 51]
]

My sorting array is

order_array = [
  "ready", "in_progress", "recieved", "shipped", "scheduled", "pick_up",
 "delivered", "canceled", "failed", "refunded", "refund_failed"
]

I need to order x based on the value of the first element in each subarray. The required sorted array is:

[
  ["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23],
  ["canceled", 51], ["refunded", 1]
]

Using sort_by doesn't result in the required sorting, it leads to the same array.

result = x.sort_by {|u| order_array.index(u)}
# => [
#      ["ready", 5], ["shipped", 1], ["pending", 1], ["refunded", 1],
#      ["delivered", 23], ["scheduled", 1], ["canceled", 51]
# ]
sawa
  • 165,429
  • 45
  • 277
  • 381
Selim Alawwa
  • 742
  • 1
  • 8
  • 19
  • 3
    What about `["pending", 1]` – should it be removed because `"pending"` is not an element of `order_array`? – Stefan Mar 13 '19 at 11:58
  • 1
    RE `order_array[2]`: chant, " 'I' before 'E' except after 'C' or when sounding like 'A' in 'neighbor' or 'weigh' ". (Exceptions exist.) – Cary Swoveland Mar 13 '19 at 19:20

6 Answers6

5
h = x.to_h
# => {"ready"=>5,
# "shipped"=>1,
# "pending"=>1,
# "refunded"=>1,
# "delivered"=>23,
# "scheduled"=>1,
# "canceled"=>51}

order_array.map{|key| [key, h[key]] if h.key?(key)}.compact
# => [["ready", 5],
# ["shipped", 1],
# ["scheduled", 1],
# ["delivered", 23],
# ["canceled", 51],
# ["refunded", 1]]

or

h = x.to_h{|k, v| [k, [k, v]]}
#=> {"ready"=>["ready", 5],
# "shipped"=>["shipped", 1],
# "pending"=>["pending", 1],
# "refunded"=>["refunded", 1],
# "delivered"=>["delivered", 23],
# "scheduled"=>["scheduled", 1],
# "canceled"=>["canceled", 51]}

order_array.map{|k| h[k]}.compact
#=> [["ready", 5],
# ["shipped", 1],
# ["scheduled", 1],
# ["delivered", 23],
# ["canceled", 51],
# ["refunded", 1]]

or

h = x.to_h{|k, v| [k, [k, v]]}
#=> {"ready"=>["ready", 5],
# "shipped"=>["shipped", 1],
# "pending"=>["pending", 1],
# "refunded"=>["refunded", 1],
# "delivered"=>["delivered", 23],
# "scheduled"=>["scheduled", 1],
# "canceled"=>["canceled", 51]}

h.values_at(*order_array).compact
#=> [["ready", 5],
# ["shipped", 1],
# ["scheduled", 1],
# ["delivered", 23],
# ["canceled", 51],
# ["refunded", 1]]
sawa
  • 165,429
  • 45
  • 277
  • 381
4

You're almost there with this: index isn't working as you're comparing the full array, rather than the first element of it. This will work:

result = x.sort_by { |u| order_array.index(u[0]) || 100 }
#=> [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1], ["pending", 1]]

Please note, the 100 is there to default to the back of the sort if the value isn't found in order_array.


Edit

This was initially accepted, despite including ["pending", 1] suggesting it fit the requirements; however, here's a solution to avoid the unwanted entry, which also handles duplicates should the need arise.

order_array.each_with_object([]) { |ordered_by, array| array.push(*x.select { |item| item[0] == ordered_by }) }
#=> [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1]]

Or, very fast though still allowing for duplicate values under each ordered item:

hash = x.each_with_object(Hash.new { |h,k| h[k] = [] }) { |item, h| h[item[0]] << item[1] }
order_array.flat_map { |key| [key, hash[key]] }

Benchmark

Here's a benchmark for this scenario with a larger dataset: https://repl.it/repls/SentimentalAdequateClick. Looks like Sawa's methods lead the way, though my last effort works handily should there be duplicate values in future. Also, my second effort sucks (which surprised me a little) :)

SRack
  • 11,495
  • 5
  • 47
  • 60
  • That does not give what the OP wanted. – sawa Mar 13 '19 at 11:38
  • Can see it's including `pending`, which I hadn't noticed was missing above @sawa. Your approach makes more sense if this is essential. Cheers for the comment. – SRack Mar 13 '19 at 11:42
  • What is the output that you really wanted? – sawa Mar 13 '19 at 11:44
  • I think this fits with the OP's question? The output is added to my answer: it matches the required output, albeit with `["pending", 1]` bumped to the back of the array. – SRack Mar 13 '19 at 11:47
  • As you have noted, `["pending", 1]` is added at the end, which the OP has not wanted (according to the question). – sawa Mar 13 '19 at 11:48
  • Yep, agree - I've upvoted yours, and mentioned above I prefer your approach due to that. Overlooked the mistake ahead of your comment. This seems to fit Selim's requirements, although yours is definitely the correct one as per the question. – SRack Mar 13 '19 at 11:50
  • Edited to include a working solution, which I think is pretty effective @SelimAlawwa. – SRack Mar 13 '19 at 13:17
  • Very good! Maybe you want to add the link to the benchmark directly in your answer. – iGian Mar 13 '19 at 18:00
  • @SRack, I noticed that the benchmark is on Ruby 2.5 environment, so, the last two methods from Sawa don't return the expected result. Tested on 2.6 are still a lot faster. I edited my post with a further option. – iGian Mar 13 '19 at 19:38
  • Thanks @iGian - I'd tested the same on my machine with better results, though hadn't realised Repl.it was on 2.5. Appreciate the clarification. – SRack Mar 14 '19 at 09:33
4

assoc seems helpful: "Searches through an array whose elements are also arrays comparing obj with the first element of each contained array using obj.==."

order_array.map{|e| x.assoc(e) }.compact
steenslag
  • 79,051
  • 16
  • 138
  • 171
2

I'd suggest

x.keep_if { |e| order_array.include? e[0] }.sort_by { |e| order_array.index(e[0]) }

Since some values are not elements of order_array, for example "pending".

#=> [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1]]


Benchmarked the answers up to now 500.times:
#        user       system     total       real
# sawa   0.006698   0.000132   0.006830 (  0.006996) # on the first method
# ray    0.005543   0.000123   0.005666 (  0.005770)
# igian  0.001923   0.000003   0.001926 (  0.001927)
# srack  0.005270   0.000168   0.005438 (  0.005540) # on the last method


Just for fun I tried to find a faster method for Ruby 2.5:
xx = x.to_h # less than Ruby 2.6
order_array.each.with_object([]) { |k, res| res << [k, xx[k]] if xx.has_key? k }
iGian
  • 11,023
  • 3
  • 21
  • 36
  • This was my thought as an edit for mine when I realised my mistake, though it's more complex that @sawa's answer, so didn't think it worthwhile. Still pretty readable though. – SRack Mar 13 '19 at 11:58
  • I found it looping 2 times (increasing complexity) and also `index` is quite slow in case of large array. – ray Mar 13 '19 at 12:26
  • 1
    @ray, actually the benchmark surprised me. Maybe you can double check? I considered the array to be small in for te OP case. – iGian Mar 13 '19 at 15:47
  • 1
    @iGian credit where it's due - this does outperform both Ray's and my second answer, even as it scales. I've benchmarked with a larger dataset ([here](https://repl.it/repls/SentimentalAdequateClick)) and it's basically possible to conclude Sawa is king :) – SRack Mar 13 '19 at 17:15
1

You can try below code to find output efficiently,

order_array.map { |p| x.detect { |y| y[0] == p } }.compact
# => [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1]]
ray
  • 5,454
  • 1
  • 18
  • 40
0

I've assumed:

  • the first element of each element of x is not necessarily unique;
  • all elements of x whose first element is the same and whose first element is a member of order_array appear consecutively in the returned (sorted) array in the order in which those elements appear in x;
  • any elements of x whose first element is not a member of order_array appears in the returned (sorted) array after all elements whose first element is in sorted_array, and all such elements appear in the returned array (at the end) in the order in which they occur in x; and
  • efficiency is paramount.

x = [
  ["ready", 5], ["shipped", 1], ["pending", 1], ["refunded", 1], ["originated", 3],
  ["delivered", 23], ["scheduled", 1], ["ready", 8], ["canceled", 51]
]

order_array = [
  "ready", "in_progress", "received", "shipped", "scheduled", "pick_up",
  "delivered", "canceled", "failed", "refunded", "refund_failed"
]

order_pos = order_array.each_with_object({}) { |word,h| h[word] = [] }
  #=> {"ready"=>[], "in_progress"=>[], "received"=>[], "shipped"=>[],
  #    "scheduled"=>[], "pick_up"=>[], "delivered"=>[], "canceled"=>[],
  #    "failed"=>[], "refunded"=>[], "refund_failed"=>[]} 
back = x.each_with_index.with_object([]) { |((word,v),i),back|
  order_pos.key?(word) ? (order_pos[word] << i) : back << [word,v] }
  #=> [["pending", 1], ["originated", 3]] 
order_pos.flat_map { |word,offsets| offsets.map { |i| x[i] } }.concat(back)
  #=> [["ready", 5], ["ready", 8], ["shipped", 1], ["scheduled", 1],
  #    ["delivered", 23], ["canceled", 51], ["refunded", 1], ["pending", 1],
  #    ["originated", 3]] 

Note:

order_pos
  #=> {"ready"=>[0, 7], "in_progress"=>[], "received"=>[], "shipped"=>[1],
  #    "scheduled"=>[6], "pick_up"==>[], "delivered"=>[5], "canceled"=>[8],
  #    "failed"=>[], "refunded"=>[3], "refund_failed"=>[]} 

It is necessary to initialise order_pos in order for its keys to be ordered by order_arr. This is an example of the worth of a controversial change made in Ruby 1.9 which guaranteed that hash keys will remain in key-insertion order.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100