3

Firstly,I have read about this problem

I have a np.array(from a picture)

[[255 255 255 ... 255 255 255]
 [255 255 0 ... 255 255 255]
 [255 255 255 ... 255 255 255]
 ...
 [255 255 0 ... 0 255 255]
 [255 255 0 ... 255 255 255]
 [255 255 255 ... 255 255 255]]

I want to delete the row which the amount of 0 is smaller than a specific value. My code is:

import numpy
from collections import Counter

for i in range(pixelarray.shape[0]):
    # Counter(pixelarray[i])[0] represent the amount of 0 in one row.
    if Counter(pixelarray[i])[0] < 2: # check the amount of 0,if it is smaller than 2,delete it.
        pixelarray = np.delete(pixelarray,i,axis=0) # delete the row
print(pixelarray)

But it raised the error:

Traceback (most recent call last):
  File "E:/work/Compile/python/OCR/PictureHandling.py", line 23, in <module>
    if Counter(pixelarray[i])[0] <= 1:
IndexError: index 183 is out of bounds for axis 0 with size 183

What should I do?

Kevin Mayo
  • 1,089
  • 6
  • 19
  • 1
    You can just do `rows_with_min_zeros = pixelarray[(pixelarray == 0).sum(1) >= MIN_ZEROS]`. – jdehesa Apr 03 '20 at 14:22
  • @jdehesa Wow,That very simple.Why don't post it as an answer?But could you tell me what does it means and why my code will raise error?I am really new to study numpy. – Kevin Mayo Apr 03 '20 at 14:25

3 Answers3

3

np.delete is probably not the best choice for this problem. This can be solved simply by masking out the rows that do not meet the required criteria. For that, you start by counting the number of zeros per row:

zeros_per_row = (pixelarray == 0).sum(1)

This first compares each value in pixelarray with zero, and then sums (counts the number of True values) its columns (axis 1), so you get the number of zeros in each row. Then, you can simply do:

rows_with_min_zeros = pixelarray[zeros_per_row >= MIN_ZEROS]

Here, zeros_per_row >= MIN_ZEROS produces a boolean array where every value larger or equal to MIN_ZEROS is True. Using boolean array indexing, this can be used to exclude the rows where it is False, that is, the rows where the number of zeros is less than MIN_ZEROS.

jdehesa
  • 58,456
  • 7
  • 77
  • 121
3
if Counter(pixelarray[i])[0] <= 1:
IndexError: index 183 is out of bounds for axis 0 with size 183

In that expression pixelarray[i] is the only part that could raise that error. That's a numpy error, telling us the i is too large for the current shape of pixelarray.

pixelarray is a 2d array. i counts upward toward pixelarray.shape[0] (the original shape). But you are deleting rows from pixelarray in the loop; it's shrinking. So at some point the counter goes beyond the current size of the array.

You would encounter this in base Python if you deleted elements from a list in a loop.

In [456]: alist = [1,2,3,4]                                                                    
In [457]: for i in range(len(alist)): 
     ...:     print(i, alist) 
     ...:     del alist[i] 
     ...:                                                                                      
0 [1, 2, 3, 4]
1 [2, 3, 4]
2 [2, 4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-457-5e5f105666aa> in <module>
      1 for i in range(len(alist)):
      2     print(i, alist)
----> 3     del alist[i]
      4 

IndexError: list assignment index out of range

See how the list shrinks even as i increases. By i=2 the list is down to 2 items, so alist[2] is no longer valid. Notice how it also dropped the '3', not the '1'. If my intent was to drop consecutive values from the list, this does not work.

With lists the way around this sort of issue is to drop from end

In [463]: for i in range(len(alist),0,-1): 
     ...:     print(i, alist) 
     ...:     del alist[i-1] 
     ...:      
     ...:                                                                                      
4 [1, 2, 3, 4]
3 [1, 2, 3]
2 [1, 2]
1 [1]
In [464]: alist                                                                                
Out[464]: []

In your case, np.delete makes a new array with each call. For arrays this is quite inefficient. So indexing issue or not, we discourage iterative deletes like this. You could though collect all the desired 'delete' indices in a list (list append is efficient), and do on delete at the end. np.delete takes of a list of indices.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Wow,Thanks for your patience!!!!It tell me why my answer is wrong.Now I am really don't know what is answer I should accept.It is a hard decision. – Kevin Mayo Apr 03 '20 at 15:52
  • My focus is on why you got the error. That's a basic Python programming issue. – hpaulj Apr 03 '20 at 15:55
0

just use a copy of pixelarray when iterating. Try this:

import numpy
from collections import Counter
from copy import copy

pixelarray2 = copy(pixelarray)
for i in range(pixelarray2.shape[0]):
    # Counter(pixelarray[i])[0] represent the amount of 0 in one row.
    if Counter(pixelarray2[i])[0] < 2: # check the amount of 0,if it is smaller than 2,delete it.
        pixelarray = np.delete(pixelarray,i,axis=0) # delete the row
print(pixelarray)
Muhammad Moiz Ahmed
  • 1,799
  • 1
  • 11
  • 9