0

I have thousands of images in a folder. The images are named 0.png, 1.png, 2.png......

I wrote the following code to generate an average image for positive samples and similarly for negative samples.

file_list = glob.glob(trainDir)
n = len(file_list)
label = np.load('labels_v2.dat')
positive = np.empty((300,400,4))
negative = np.empty((300,400,4))
labels = np.empty(n)
count_p = 0
count_n = 0

for i in range(1000):
    img = imread(file_list[i])
    lbl = label[i]
    if (lbl == 1):
        positive +=  img
        count_p += 1
        print file_list[i]

However this reads the files in the order 1,10,100,1000,10000,10001...... and my labels are in the order 0,1,2,3,..... How can I make it read in the right order?

Abhishek Thakur
  • 16,337
  • 15
  • 66
  • 97

2 Answers2

3
file_list = os.listdir(trainDir)
file_list.sort(key=lambda s: int(os.path.splitext(s)[0]))

Or, to skip the O(n lg n) cost of sorting, inside the loop do

img = imread("%d.EXT" % i)

where EXT is the appropriate extension (e.g. jpg).

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
1

You seem to want numeric order rather than lexicographical order on your sort. My first thought was:

import locale
l=["11", "01", "3", "20", "0", "5"]
l.sort(key=locale.strxfrm)    # strcoll would have to repeat the transform
print l

But that only helps if your locale actually sorts numbers that way, and I don't know what to set it to for that.

In the meantime, one workaround is to find the numbers in your sorting function.

def numfromstr(s):
  s2=''.join(c for c in s if c.isdigit())
  return int(s2)
l.sort(key=numfromstr)

But this alone has the downside of sorting only on numbers. One could compensate by splitting on numeric boundaries and sorting the resultant tuples... this is getting complex.

import re
e=re.compile('([0-9]+|[^0-9]+)')
def sorttup(s):
  parts=[]
  for part in e.findall(s):
    try:
      parts.append(int(part))
    except ValueError:
      parts.append(part)
  return tuple(parts)
l.sort(key=sorttup)

Well, that's at least a bit closer, but it's neither pretty nor very fast.

Similar question with some more answers.

Community
  • 1
  • 1
Yann Vernier
  • 15,414
  • 2
  • 28
  • 26