18

I have a list of files with a bunch of attributes. One of the attributes is the file name which is how I would like to sort the list. However, the list goes something like this: filename 1, filename 2, filename 10, filename 20.

The ruby sort_by method produces this:

files = files.sort_by { |file| file.name }
=> [filename 1, filename 10, filename 2, filename 20]

I would like a more human readable list like filename 1, filename 2, filename 10, filename 20

I found the natural_sort gem but it seems to only work like the sort method. I need something where I can specify what to sort the array by.

Any help?

Nate Bird
  • 5,243
  • 2
  • 27
  • 37

7 Answers7

29

Here's another take on a "natural" sort method:

class String
  def naturalized
    scan(/[^\d\.]+|[\d\.]+/).collect { |f| f.match(/\d+(\.\d+)?/) ? f.to_f : f }
  end
end

This converts something like "Filename 10" into a simple array with floats in place of numbers [ "Filename", 10.0 ]

You can use this on your list:

files.sort_by! { |file| file.name.to_s.naturalized }

This has the advantage of working on arbitrary numbers in unpredictable positions. The paranoid .to_s call in that block is to ensure that there is a string and not an inadvertent nil when sorting.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • Wow thats magic. In my use case, identifiers may be separated by ".". Thus I remove the two '\.' in the regexp used in scan(). I don't think this could break anything. – Yannick Wurm Mar 16 '11 at 05:56
  • It will mean that any values with a decimal place will be interpreted as separate numbers. 10.2 will come after 10.1 but before 10.11. – tadman Mar 16 '11 at 18:55
  • Right, I had to remove the \.'s (and the third one!) as well, else 3.3 and 3.25 sorted wrong. So: scan(/[^\d]+|[\d]+/).collect { |i| i.match(/\d+/) ? i.to_i : i } – Dan Kegel Jun 16 '16 at 17:03
  • 1
    And then float seemed like the wrong thing to convert the words to. So: scan(/[^\d]+|[\d]+/).collect { |w| w.match(/\d+/) ? w.to_i : w } – Dan Kegel Jun 16 '16 at 17:10
  • If you don't want whitespace you could add `\s` to the first regex matcher exclude to be like: `irb(main):021:0> "Filename 10".scan(/[^\s\d\.]+|[\d\.]+/)` produces `=> ["Filename", "10"]`. Otherwise I see `irb(main):020:0> "Filename 10".scan(/[^\d\.]+|[\d\.]+/)` produces `=> ["Filename ", "10"]` (notice the space after "Filename") – cdmo Feb 14 '20 at 16:44
19

generic answer for strings natural sort

array.sort_by {|e| e.split(/(\d+)/).map {|a| a =~ /\d+/ ? a.to_i : a }}
shurikk
  • 534
  • 7
  • 8
  • 1
    This fails on simple arrays like `["a1", "aa" ]` because `[ "a", 1 ] <=> [ "a", "a" ]` returns nil, and `sort_by` does not like that. (I'm not sure why that's returning nil.) – Zach Wily Mar 05 '13 at 18:08
  • supposed to be \d+, my bad – shurikk Mar 06 '13 at 07:01
9

I've created a natural sort gem. It can sort by an attribute like this:

# Sort an array of objects by the 'number' attribute
Thing = Struct.new(:number, :name)
objects = [
  Thing.new('1.1', 'color'),
  Thing.new('1.2', 'size'),
  Thing.new('1.1.1', 'opacity'),
  Thing.new('1.1.2', 'lightness'),
  Thing.new('1.10', 'hardness'),
  Thing.new('2.1', 'weight'),
  Thing.new('1.3', 'shape')
  ]
Naturally.sort_by(objects, :number)

# => [#<struct Thing number="1.1", name="color">,
      #<struct Thing number="1.1.1", name="opacity">,
      #<struct Thing number="1.1.2", name="lightness">,
      #<struct Thing number="1.2", name="size">,
      #<struct Thing number="1.3", name="shape">,
      #<struct Thing number="1.10", name="hardness">,
      #<struct Thing number="2.1", name="weight">]
Dogweather
  • 15,512
  • 17
  • 62
  • 81
6

As long as files are always named "file #", you could do

files.sort_by{|f| f.name.split(" ")[1].to_i }

This splits on the space, and grabs the number to do the sorting.

William
  • 3,511
  • 27
  • 35
  • [1] returns the second item of the array returned by split, in this case, the number. – Teoulas Nov 02 '10 at 15:17
  • 2
    Alternatively, you could use `.last` instead of `[1]`, so `files.sort_by{|f| f.name.split(" ").last.to_i }` – William Nov 02 '10 at 15:23
  • 1
    Also, alternatively, `split` assumes whitespace as the pattern to split on, so `files.sort_by{|f| f.name.split.last.to_i }` will work as well. Just to tidy things up a bit :) – William Nov 02 '10 at 15:24
  • I used the .last method since there are a number of spacing breaks in the filenames. – Nate Bird Nov 02 '10 at 15:43
  • no problem, keep in mind that this method might not be the best if `'filename'` is not always what files start with, since it strictly sorts on the numbers, so "awesomefile 50" will appear after "zoo file 1". – William Nov 02 '10 at 15:48
  • Right. I have other sorting criteria as well but I just put up the basic example to understand the principles. Thanks! – Nate Bird Nov 02 '10 at 15:51
2

Natural Sort gem.

Install

gem "natural_sort"

Usage

list = ["a10", "a", "a20", "a1b", "a1a", "a2", "a0", "a1"]
list.sort(&NaturalSort) # => ["a", "a0", "a1", "a1a", "a1b", "a2", "a10", "a20"]
Joshua Pinter
  • 45,245
  • 23
  • 243
  • 245
  • Damn, coming back to this and this is still very much the way to go. Excellent little gem for naturally sorting things. – Joshua Pinter Dec 16 '20 at 17:47
0
array.sort_by{|x| ( x.class == Array ? x.join(" ") : x.to_s ).split(/(\d+)/).map{|x| x.to_s.strip }.select{|x| x.to_s != "" }.map{|x| x =~ /\d+/ ? x.to_s.rjust(30) : x }}

This can compare arrays by arrays in the sort_by method even if the type of the matching items differ. Even if there are deeper nested arrays. Example:

[ "3  a   22", "b  22     1", "   b  5  ", [11, 2, [4, 5]] ] #=>
[ "3  a   22", [11, 2, [4, 5]], "   b  5  ", "b  22     1" ]

The point here is that during the sort if an item is a nested array then we convert it to a string beforehand. And if parts of the string contain digits only then we do not convert them to numeric values but instead extend them with spaces, like:

30 #=> "                         30"

This way all objects will be compatible strings and the sorting will be able to compare them resulting in a numeric sort if the matching objects at their positions are numbers only.

horv77
  • 43
  • 7
-3

It is sorting correctly. The problem here is that the names aren't good to sort the way you want. In means of string, 10 comes before 2 and 21 comes before 5.

If you want it to sort it like it was numbers, you have 2 approaches:

1 - Change all your listings to add a leading 0 before numbers with just one digit.

2 - Do as William suggested, aplit the name, transform the string to integer and sort by it.

I would recommend option 1 since the second rely on the padronization of the names.

Paulo Henrique
  • 1,025
  • 8
  • 12