2

I have Numpy 3d array that is just a list of gray images:

images = np.zeros((xlen, height, width), dtype=int)
for i in range (5):
   images[i] = cv2.imread(filename[i], cv2.IMREAD_GRAYSCALE)

All the images are pretty the same but they all have some random noise pixels. My idea is that noise pixel are max or min values compared to the same pixels in other images.

So i need to:

  1. Find the min and max values for each pixel
  2. Calculate the average value for each pixel between all the images without that max and min values
  3. Replace all min and max values with calculated average

I implemented that in a naive way using standard python functions but that is tooo slow:

   #remove highest and lowest values for each pixel
   for el in range (height):
      for em in range (width):
         mylist = []
         for j in range (0, xlen):
            mylist.append(images[j][el][em])
         indmin = mylist.index(min(mylist))
         indmax = mylist.index(max(mylist))
         temp_counterx=0
         temp_sum = 0
         for j in range (0, xlen):
            if (j!=indmin) and (j!=indmax):
               temp_counterx +=1
               temp_sum += mylist[j]
         temp_val = int(temp_sum/temp_counterx)
         images[indmin][el][em]=temp_val
         images[indmax][el][em]=temp_val

Is it possible to speed that up using Numpy?

UPD: Accepted solution proposed by flawr with some minor changes:

   mins = np.min(images, axis=0)
   maxs = np.max(images, axis=0)
   sums = np.sum(images, axis=0)
   # compute the mean without the extremes
   mean_without_extremes = (sums - mins - maxs) / (xlen - 2)
   mean_without_extremes = mean_without_extremes.astype(int)

   # replace maxima with the mean
   images = np.where((mins==images), images, mean_without_extremes)
   images = np.where((maxs==images), images, mean_without_extremes)

...and got 30-fold speed increase! It seems that numpy provides really fast and powerful computing engine, just usage could be tricky sometimes due to the complex data structure it's dealing.

flawr
  • 10,814
  • 3
  • 41
  • 71
Cepera
  • 95
  • 7

2 Answers2

1

First of, to compute things like a mean you probably want to use floating point numbers instead of integers to start width. So in the following I assume you use those instead.

By using python-loops you're giving away all the advantages of numpy, because they are inherently slow, at least compared to the underlying compiled code that is executed when calling numpy functions. If you want your code to be reasonably fast you should make use of vectorization. Consider following code that does what you ask for but without any loops in python:

# compute minima, maxima and sum
mins = np.min(images, axis=0)
maxs = np.max(images, axis=0)
sums = np.sum(images, axis=0)
# compute the mean without the extremes
mean_without_extremes = (sums - mins - maxs) / (xlen - 2)
# replace maxima with the mean
images[images == mins] = mean_without_extremes.reshape(-1)
images[images == maxs] = mean_without_extremes.reshape(-1)

As you're probably not familiar with that, I recommend reading the introdcutions in the documentation about indexing and broadcasting in order to use numpy efficiently:


EDIT: As pointed out in the comments, the solution above only works for xlen > 2 and if the extrema are only attained once per pixel location. this could be fixed by replacing these lines with

images = np.where(images == mins, images, mean_without_extremes)
images[np.isnan(images)] = 0  # set "empty mean" to zero
# using "np.where" as suggested by OP
# we can actually reduce that to one "np.where" call which might be slightly faster
images = np.where(np.logical_or(images == mins, images == maxs), images, mean_without_extremes)
flawr
  • 10,814
  • 3
  • 41
  • 71
  • Wow, didn't expect it would be that plain. Tried to invent something bulky using unravel_index. Unfortunately your code isn't running because of an error on the line 'images[images == mins]...': NumPy boolean array indexing assignment cannot assign X input values to the Y output values , where X - is the image size, and Y is about 5% more than image size. All the shapes print show X, so I cannot understand where Y comes from. Any ideas? – Cepera May 10 '21 at 00:37
  • I guess that additional 5% of increased array size comes from the rows that have few equal values that are min or max. Numpy.ma. could be the solution here, right? – Cepera May 10 '21 at 01:25
  • 1
    Changed last 2 lines to np.where () and it run smoothly. Thank you for your answer. Have no rep yet to upvote, so return to it a bit later. Your help is appreciated. – Cepera May 10 '21 at 03:02
  • Ah good point, I have not considered the case where multiple values are minimal or maximal. I'm glad you could figure it out! – flawr May 10 '21 at 09:27
0

Make sure that everything you use is a numpy array and NOT Python list and make sure that all of the members have the same data type. In your case this is true.

Now you can make use of a library called numba. It uses JIT.

A video demonstrating it can be seen here.

The documentation for numba can be seen here

Suzuna Minami
  • 391
  • 2
  • 11
  • While `numba` can help in certain cases, I believe it is completely unnecessary here. The first OP should do is learn how to use numpy efficiently before trying this. – flawr May 09 '21 at 22:36
  • Of course I've tried to be as far from python list type as I can. Used the 'list' word to explain the data organization. Numba looks like an interesting alternative to slow python code execution, definitely have to try it in my projects. – Cepera May 10 '21 at 03:13