-1

I am trying to calculate intensities in different regions in a image(taken in 16-bit). I have created binary masks first of the different regions, then using the masking I calculate the intensities. However, the problem was I was getting a negative values when I was doing the sum of the intensities. If do np.int64 the problem is not happening, but the values changes if I do int16/32. So which one would be correct and did I put it the correct place? Also is there any difference if I do np.int64 with astype("uint64")?

image = imread(path)
nucleus = gaussian_filter(image,5)
nucleus_mask = np.where(nucleus>10000,1,0)

tissue = gaussian_filter(image,5)
tissue_mask = np.where(tissue> 1000,1,0)

only_tissue = tissue_mask - nucleus_mask
only_tissue = np.where(only_tissue<0,0,only_tissue) 

autoAb = imread(auto_path)
auto_nucleus = np.sum(np.int64(nucleus_mask*autoAb))
auto_cyto = np.sum(np.int64(only_tissue*autoAb)) 
Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
Deep
  • 79
  • 1
  • 1
  • 7
  • `int16` and `int32` are signed types. If the actual values are unsigned, treating them as signed will produce negative numbers for some values. – Barmar Apr 11 '23 at 21:01

1 Answers1

1

The sum of many small values will be a large value.

That large value may be too large to fit in a 16-bit or 32-bit integer, signed or not.

if you have 65538 pixels (which is roughly 256 by 256 pixels), each containing the value 65535, the sum of all those (or the product) is 0x10000fffe and that doesn't fit in 32 bits.

If you took only half that many pixels (32769, roughly 181 by 181 pixels) with maximal values, you'd still overflow a signed 32-bit integer, and the result would be negative.

You can use np.sum() with the dtype argument. The sum will be calculated from original elements but it will be correct.

Numpy mostly does not care if you use np.uint64 or "uint64" or "u8"

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
  • Thanks. So the size is 512 by 512. So to confirm is that what I did in the code is it correct ? – Deep Apr 11 '23 at 21:15
  • uhhh your code has several issues. you should just use the boolean array itself: `nucleus_mask = (nucleus > 10000)`. ditch the `where()`. no need to multiply either. `autoAb[nucleus_mask].sum()` – Christoph Rackwitz Apr 11 '23 at 21:18
  • So thanks again for the help. This was new but indeed helpful. I actually need to compute the mean intensity later so use the masks to calculate the area of the masks. So the final value I calculate is total_intensity_region/area_region. I know I can use the boolean values to find the area but idk if that would work when I can doing the subtraction. And earlier did you mean I should np.sum(autoAb*nucleus_mask, dtype=np.int64) – Deep Apr 11 '23 at 22:01
  • for the mean, you can just use `np.mean()`, which may or may not be a method as well. boolean arrays can be combined with `&` and `|` and negated with `~`. sum() always picks a sensible data type for the accumulator. I'm surprised you were able to get negative values at all. sum() should not be able to do that unless the type of the accumulator is given specifically. – Christoph Rackwitz Apr 11 '23 at 22:04
  • I think np.mean() will take the mean of all the values int he array. I am not looking for that. – Deep Apr 11 '23 at 22:07
  • `np.mean()` will take the mean of whatever you pass it. if you pass it the result of boolean indexing, `autoAb[nucleus_mask]`, then that will do the right thing. – Christoph Rackwitz Apr 11 '23 at 22:09