4

i have already read an image as an array :

import numpy as np
from scipy import misc
face1=misc.imread('face1.jpg')

face1 dimensions are (288, 352, 3)

i need to iterate over every single pixel and populate a y column in a training set i took the following approach :

Y_training = np.zeros([1,1],dtype=np.uint8)

for i in range(0, face1.shape[0]): # We go over rows number 
    for j in range(0, face1.shape[1]): # we go over columns number
        if np.array_equiv(face1[i,j],[255,255,255]):
           Y_training=np.vstack(([0], Y_training))#0 if blank
        else:
           Y_training=np.vstack(([1], Y_training))

b = len(Y_training)-1
Y_training = Y_training[:b]
np.shape(Y_training)`

Wall time: 2.57 s

As i need to do above process for about 2000 images is there any faster approach where we could decrease running time to milliseconds or naonseconds

chessosapiens
  • 3,159
  • 10
  • 36
  • 58
  • For future similar problems you might want to try numba ```@njit``` decorator. So that you get a JIT comiled function that runs very fast – Gioelelm Aug 02 '17 at 09:25

1 Answers1

7

You can use broadcasting to perform broadcasted comparison against the white pixel : [255, 255, 255] and ALL reduce each row with .all(axis=-1) and finally convert to int dtype. This would give us the output you would have right after exiting the loop.

Thus, one implementation would be -

(~((face1 == [255,255,255]).all(-1).ravel())).astype(int)

Alternatively, a bit more compact version -

1-(face1 == [255,255,255]).all(-1).ravel()
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • i think it should be `(~((face1 != [255,255,255]).all(-1).ravel())).astype(int)`, as i need to label white pixels by `0` – chessosapiens Jul 27 '17 at 14:50
  • @sanaz Nope. Please check again. – Divakar Jul 27 '17 at 14:51
  • `(~((face1 == [255,255,255]).all(-1).ravel())).astype(int)` returns `array([1, 1, 1, ..., 1, 1, 1])` which labels white pixels with `1` , my own for loop approach labels them with `0` – chessosapiens Jul 27 '17 at 14:56
  • @sanaz Try a small sample at your end : `face1 = np.random.randint(0,255,(4,5,3))`. Then set two pixels to white : `face1[2,3] = [255,255,255]; face1[1,2] = [255,255,255]`. Now, set a third pixel to blue : `face1[1,3] = [255,0,0]`. So, now use all the methods you have and see which methods give you array with exactly two 0s. Those are the correct ones. – Divakar Jul 27 '17 at 14:59
  • what would be the most efficient way to have the result as a column vector not a row vector? having `(101376,1)` instead of `(101376,)` – chessosapiens Jul 27 '17 at 15:53
  • @sanaz NumPy doesn't have the concept of row/column vector. But guessing from the format you have in the question, I would say row vector would be best, whereas making column vector from it won't be too costly either. – Divakar Jul 27 '17 at 15:56
  • what i meant was having (101376,1) instead of (101376,) – chessosapiens Jul 27 '17 at 15:59
  • @sanaz Yeah, that's what I guessed and my comments were based on that assumption. – Divakar Jul 27 '17 at 16:01