8

if I have a list of strings e.g. ["a143.txt", "a9.txt", ] how can I sort it in ascending order by the numbers in the list, rather than by the string. I.e. I want "a9.txt" to appear before "a143.txt" since 9 < 143.

thanks.

Maciej Ziarko
  • 11,494
  • 13
  • 48
  • 69
  • 1
    This question does not appear to have anything to do with `scipy` or `numpy`. If this is the case, please remove those tags. – JoshAdel Mar 30 '11 at 20:35
  • 1
    Edited tags. Now it's more clear. – Maciej Ziarko Mar 30 '11 at 20:45
  • possible duplicate of [How do you sort files numerically?](http://stackoverflow.com/questions/4623446/how-do-you-sort-files-numerically) – Daniel DiPaolo Mar 30 '11 at 21:18
  • Possible duplicate of [Python analog of natsort function (sort a list using a "natural order" algorithm)](http://stackoverflow.com/questions/2545532/python-analog-of-natsort-function-sort-a-list-using-a-natural-order-algorithm) – tommy.carstensen Mar 20 '17 at 16:55

5 Answers5

14

It's called "natural sort order", From http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html

Try this:

import re 

def sort_nicely( l ): 
  """ Sort the given list in the way that humans expect. 
  """ 
  convert = lambda text: int(text) if text.isdigit() else text 
  alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
  l.sort( key=alphanum_key ) 
wmil
  • 3,179
  • 2
  • 21
  • 24
  • I'd use `text.lower()` at the end of the `convert = ` line to make it case-insensitive. – kindall Mar 30 '11 at 20:36
  • +1. You might want to replace the lambda with a proper function definition, for readability. Incidentally, Debian package version numbers are compared more or less like this. http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version – ʇsәɹoɈ Mar 30 '11 at 20:44
  • 1
    +1 Nice answer. The only thing I didin't like are extra white-spaces. I mean here: `[ convert(c) for c in re.split('([0-9]+)', key) ]` and `l.sort( key=alphanum_key )` and `sort_nicely( l )` – Maciej Ziarko Mar 30 '11 at 20:56
  • +1, nicely done! I redid alphanum_key as `alphanum_key = lambda key: map(convert, re.split('([0-9]+)', key))`. – PaulMcG Mar 30 '11 at 23:39
0

Use list.sort() and provide your own function for the key argument. Your function will be called for each item in the list (and passed the item), and is expected to return a version of that item that will be sorted.

See http://wiki.python.org/moin/HowTo/Sorting/#Key_Functions for more information.

bradley.ayers
  • 37,165
  • 14
  • 93
  • 99
0

If you want to completely disregard the strings, then you should do

import re
numre = re.compile('[0-9]+')
def extractNum(s):
    return int(numre.search(s).group())

myList = ["a143.txt", "a9.txt", ]
myList.sort(key=extractNum)
highBandWidth
  • 16,751
  • 20
  • 84
  • 131
0

list.sort() is deprecated (see Python.org How-To) . sorted(list, key=keyfunc) is better.

import re

def sortFunc(item):
  return int(re.search(r'[a-zA-Z](\d+)', item).group(1))

myList = ["a143.txt", "a9.txt"]

print sorted(myList, key=sortFunc)
Prydie
  • 1,807
  • 1
  • 20
  • 30
  • list.sort() is deprecated? "Usually it's less convenient than sorted()" is the only thing in this direction I found. I have to say, though, that I'd be more than happy to see the in-place sorting go away, but it seems unlikely. – tokland Mar 30 '11 at 20:59
  • 2
    It is not deprecated. http://docs.python.org/library/stdtypes.html#mutable-sequence-types – Maciej Ziarko Mar 30 '11 at 21:03
  • It may not be technically depreciated but it is considered the "old" method and is labeled as such on Python.org. – Prydie Mar 30 '11 at 21:10
  • True `list.sort()` is certainly slightly more memory efficient but the difference is negligible as far sensible sized lists are concerned. I can't find a good explanation for why but as far as I am aware the preferred and more 'pythonic' way of doing sorting is using `sorted()`. – Prydie Mar 30 '11 at 21:35
  • It says `sorted()` returns a NEW list, not that's it's NEW. It's definitely less efficient if you really interested in modifying your list. sorted() is good for tuples. – Maciej Ziarko Mar 30 '11 at 21:36
0
>>> paths = ["a143.txt", "a9.txt"]
>>> sorted(paths, key=lambda s: int(re.search("\d+", s).group()))
['a9.txt', 'a143.txt']

More generic, if you want it to work also for files like: a100_32_12 (and sorting by numeric groups):

>>> paths = ["a143_2.txt", "a143_1.txt"]
>>> sorted(paths, key=lambda s: map(int, re.findall("\d+", s)))
['a143_1.txt', 'a143_1.txt']
tokland
  • 66,169
  • 13
  • 144
  • 170