0

this is my first post, I hope you can help me. I have 120.000 satellite images with 33x33 px and 6 channels (RBG + 3 near-infrared). We have 10.000 sections with 12 images each. (one for each month) Since a lot of them are filled with clouds, I want to find a way to get rid of them.

The images are in a Numpy array with the shape (10000, 12, 33, 33, 6) and I want to find subarrays of the shape (33, 33, 6) in which the average of ( : , : , :3 ) > 0.5 (some threshold of high RBG values indicating clouds).

As a first step, finding and deleting them would be helpful, but in order to keep the shape of the array for later training, I would need to replace them with one of the other 11 subarrays of that position.

I have tried nested for-loops with np.average, np.where, np.delete, and so on, but did not get really far (I'm a beginner).

Would appreciate any hints, tips, or solutions.

Thanks!

Max

  • Add another array of the shape (10000, 12, boolean, 6) which indicates whether the image is used. IMHO the better concept would be slightly different data structure: [10000,12,{[12,12,6],flags}] where you use flags to store meta information about the image. In later code you check the flags how you want to treat the image. – planetmaker Jul 19 '21 at 10:36

1 Answers1

0

If all you want/need to do is to find which of the "10k-12-i" observations are not useful -- e.g., (:,:,:3) < .5 -- for a later use, the best thing you have to do is to create another data-structure on the side to keep track of them.

To do that, you can use dictionaries. Since dictionaries accept any hashable object as key, you could use slice objects as keys -- to point to your RGB (sub-)arrays, and values could be a "useful/not-useful" flag. That way, once you define your dictionary -- with all the keys and initial value ("useful", until you find out the contrary) -- you could relly entirely/uniquely on your dictionary to "loop" over your arrays.

For slice objects, have a look here: https://numpy.org/doc/stable/reference/generated/numpy.s_.html . This question on numpy slices may be of your help: How can I create a slice object for Numpy array? .

Cheers.

Brandt
  • 5,058
  • 3
  • 28
  • 46