-1

I want to go through an array and find anomalies (specifically, values higher than 100).

I then want the anomalous cell to be replaced by the mean value of the surrounding cells.

Let's say the input is:

[6, 28, 33]
[20, 100, 41]
[87, 3, 39]

I want to change that 100 cell into the integer mean of

6, 28, 33, 20, 41, 87, 3, 39

which is 32

The output should be:

[6, 28, 33]
[20, 32, 41]
[87, 3, 39]

If it matters, my real array is 256*256 cells, and has ~500 values I want to change

Last point!

I know that there would be a problem fixing up the edges, so I don't care whether they are thrown out completely or some clever piece of code can average without them.

Here is my attempt:

import numpy as np
array = np.random.randint(100, size=(256, 256))
    for x in array
    if x>=100:
        x = np.mean(x+1,x-1) 
#This is where I got stuck... trying to define the surrounding cells
anakar
  • 316
  • 2
  • 13
  • 1
    For the edges would it make sense to treat them as if the right edge was attached to the left edge and the bottom to the top--or would the values from opposite edges be very different and not appropriate to average? Also of course the question of how to handle adjacent values greater than 100...or what if in the extreme case that a whole area of the matrix has values over 100. Except for these issues seems a pretty straightforward problem. – Andrew Allaire Nov 24 '20 at 15:46
  • @AndrewAllaire It doesn't make sense to wrap the edges. Regarding adjacent values greater than 100, or a whole area - I have not thought about it! these points are so spread out it didn't occur to me. I'm not sure how I would handle it. – anakar Nov 25 '20 at 08:30
  • @PranavHosangadi Added my attempt. I guess the trouble for me is defining the surrounding cells – anakar Nov 25 '20 at 08:39

1 Answers1

1

So I know this may not be the most efficient solution, but it works. If the anomaly is on the edge of the array, it just gets the mean of the array before or after it, not both.

inp =   [[6, 10, 33],
        [20, 100, 41],
        [87, 3, 39]]


for i in range(len(inp)):
    for j in range(len(inp[i])):
        if inp[i][j] >= 100:
            if i == len(inp) - 1:
                inp[i][j] = int((sum(inp[i])-inp[i][j] + sum(inp[i-1])) / (len(inp[i])-1 + len(inp[i-1])))
            elif i == 0:
                inp[i][j] = int((sum(inp[i])-inp[i][j] + sum(inp[i+1])) / (len(inp[i])-1 + len(inp[i+1])))
                print(inp[i][j])
            else:
                inp[i][j] = int((sum(inp[i])-inp[i][j] + sum(inp[i+1]) + sum(inp[i-1])) / (len(inp[i])-1 + len(inp[i+1]) + len(inp[i+1])))

Again, this solution isn't particularly efficient or pretty, but it does the job.

Luke LaBonte
  • 121
  • 3