24

I would like to know if there is something similar to PHP natsort function in Python?

l = ['image1.jpg', 'image15.jpg', 'image12.jpg', 'image3.jpg']
l.sort()

gives:

['image1.jpg', 'image12.jpg', 'image15.jpg', 'image3.jpg']

but I would like to get:

['image1.jpg', 'image3.jpg', 'image12.jpg', 'image15.jpg']

UPDATE

Solution base on this link

def try_int(s):
    "Convert to integer if possible."
    try: return int(s)
    except: return s

def natsort_key(s):
    "Used internally to get a tuple by which s is sorted."
    import re
    return map(try_int, re.findall(r'(\d+|\D+)', s))

def natcmp(a, b):
    "Natural string comparison, case sensitive."
    return cmp(natsort_key(a), natsort_key(b))

def natcasecmp(a, b):
    "Natural string comparison, ignores case."
    return natcmp(a.lower(), b.lower())

l.sort(natcasecmp);
smci
  • 32,567
  • 20
  • 113
  • 146
Silver Light
  • 44,202
  • 36
  • 123
  • 164
  • It's a natural order, image3.jpg is in it's place – Silver Light Mar 30 '10 at 13:40
  • Not builtin, an not in the standard library AFAIK. There's a recipe for it [here](http://code.activestate.com/recipes/285264-natural-string-sorting/), and other implementations can be found by Google. – Eli Bendersky Mar 30 '10 at 13:31
  • You can check this link: [Compact python human sort](http://nedbatchelder.com/blog/200712.html#e20071211T054956) – sankoz Mar 30 '10 at 13:34

3 Answers3

52

From my answer to Natural Sorting algorithm:

import re
def natural_key(string_):
    """See https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/"""
    return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', string_)]

Example:

>>> L = ['image1.jpg', 'image15.jpg', 'image12.jpg', 'image3.jpg']
>>> sorted(L)
['image1.jpg', 'image12.jpg', 'image15.jpg', 'image3.jpg']
>>> sorted(L, key=natural_key)
['image1.jpg', 'image3.jpg', 'image12.jpg', 'image15.jpg']

To support Unicode strings, .isdecimal() should be used instead of .isdigit(). See example in @phihag's comment. Related: How to reveal Unicodes numeric value property.

.isdigit() may also fail (return value that is not accepted by int()) for a bytestring on Python 2 in some locales e.g., '\xb2' ('²') in cp1252 locale on Windows.

Elijah
  • 1,814
  • 21
  • 27
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • @phihag: It works on Python 3. – jfs May 21 '12 at 16:25
  • 1
    Oops, you're totally right. I messed up the test case - the error has nothing to do with Python 3. `\d` and `isdigit` just match values that `int` does not accept. [Observe `[u'²'].sort(key=natural_key)`](http://ideone.com/iMEmv). – phihag May 21 '12 at 16:40
  • Caveat: works for the specific example shown but fails for cases like ['elm1', 'Elm2'] and ['0.501', '0.55'] and [0.01, 0.1, 1] ... see http://stackoverflow.com/questions/4836710/does-python-have-a-built-in-function-for-string-natural-sort/27430141#27430141 for lower() and my more general solution for Python natural sort order. – Scott Lawton Dec 11 '14 at 18:49
  • 1
    @ScottLawton: it works as expected. It is ok to use different defintions of what "natural sorting" is. It is not ok to tell that other (wildly used) defintions are wrong. – jfs Dec 11 '14 at 19:48
  • May I continue to ask, that if my array is a 2d array like ```[['image1.jpg', 'pathToImage1'], ['image15.jpg', 'pathToImage15'], ['image12.jpg', 'pathToImage12'], ['image3.jpg', 'pathToImage3']]```, and I want it to be sorted the same way(sort by the numeric value or the first element of each sub array, returned ```[['image1.jpg', 'pathToImage1'], ['image3.jpg', 'pathToImage3'], ['image12.jpg', 'pathToImage12'], ['image15.jpg', 'pathToImage15']]```), where should I tune this code to work? Thanks! (Do I need to open a new post for this question?) – Hang Oct 20 '18 at 02:02
  • @Hang: it is a very simple variation:`sorted(L, lambda sublist: natural_key(sublist[0]))` If it is unclear, work through [Sorting HOW TO](https://docs.python.org/3/howto/sorting.html) examples. – jfs Oct 20 '18 at 04:41
  • Thanks @jfs! I read through the examples and changed ```lambda sublist: natural_key(sublist[0])``` to ```key=lambda sublist: natural_key(sublist[0])``` so the code could run, but it seems like the order of the sublists doesn't get changed at all. I will try more and put feedback here :D PS: a repl here https://repl.it/@hanglearning/testSortSublists – Hang Oct 20 '18 at 18:21
  • @Hang: `sorted(L)` *returns* a new list (`L` is not changed). `L.sort()`modifies the list inplace (`L` is changed). It is said at the very top of the link that I've provided (under "Sorting Basics" header). – jfs Oct 28 '18 at 06:57
  • @jfs Oh! Sorry my bad! That's right! Make it equal to a new list and now it works!!! Thanks!!! – Hang Nov 02 '18 at 03:02
  • Great stuff! natsort from PyPI is great, too, but with this is I just have to add a single line of code instead of a whole new package to my app. And it absolutely does the job for file version comparison à la `major_minor_patch`. – Jeronimo Aug 06 '21 at 10:21
18

You can check out the third-party natsort library on PyPI:

>>> import natsort
>>> l = ['image1.jpg', 'image15.jpg', 'image12.jpg', 'image3.jpg']
>>> natsort.natsorted(l)
['image1.jpg', 'image3.jpg', 'image12.jpg', 'image15.jpg']

Full disclosure, I am the author.

SethMMorton
  • 45,752
  • 12
  • 65
  • 86
  • I wanted to use it, but I didn't find it for python 3.5 – FiReTiTi Apr 12 '17 at 01:02
  • @FiReTiTi It is compatible with both python 2 and python 3. I am curious how you concluded that it is not available for python 3. – SethMMorton Apr 12 '17 at 01:10
  • I tried to use it and natsort was not available. So I asked MacPort to install it, but it wanted to force me to install python 3.4 or 2.7 along with natsort, which I don't want because python 3.5 is already installed. – FiReTiTi Apr 12 '17 at 17:47
  • 1
    @FiReTiTi It sounds like something to report to the MacPort folks. natsort works on all modern versions of python. You can use pip, or if you are on Mac I would consider changing to Homebrew. – SethMMorton Apr 12 '17 at 18:24
2

This function can be used as the key= argument for sorted in Python 2.x and 3.x:

def sortkey_natural(s):
    return tuple(int(part) if re.match(r'[0-9]+$', part) else part
                for part in re.split(r'([0-9]+)', s))
phihag
  • 278,196
  • 72
  • 453
  • 469
  • `.isdecimal()` is unicode only method. It won't work on bytestrings. `.isdecimal()` matches the same set of characters ([Nd]) as `\d` which is larger than `[0-9]` in Unicode case. – jfs May 21 '12 at 20:08
  • I have no idea what the semantics of sorting two byte strings would be, so I didn't consider it. But you're right, the test is faulty. Switched to `re.match`. – phihag May 21 '12 at 20:20
  • +1. You don't use [proper Unicode sorting](http://www.unicode.org/reports/tr10/) so I don't see why you would reject bytestrings. btw, On *nix filenames are just bytes. You don't want `ls` to break just because there is a funny filename in a directory. – jfs May 21 '12 at 20:57