How to sample a huge 2D array in Python using 2x2 arrays to create a dictionary? (Stencil Algorithm for Python)

Question

I am rather new to programming, so I apologise if this is a classic and trivial question. I have a 100x100 2D array of values which is plotted by means of matplotlib. In this image, each cell has its value (ranging 0.0 to 1.0) and ID (ranging 0 to 9999 starting from the upper left corner). I want to sample the matrix by using a 2x2 moving window which produces two dictionaries:

1st dictionary: the key represents the intersection of 4 cells; the value represents the tuple with the IDs of the 4 neighboring cells (see image below - the intersection is represented by "N");
2nd dictionary: the key represents the intersection of 4 cells; the value represents the mean value of the 4 neighboring cells (see image below).

In the example below (upper left panel), where N has ID=0, the 1st dictionary would yield {'0': (0,1,100,101)} since the cells are numbered 0 to 99 toward the right hand side and 0 to 9900, step=100, downward. The 2nd dictionary would yield {'0': 0.775}, as 0.775 is the average value of the 4 neighboring cells of N. Of course, these dictionaries must have as many keys as "intersections" I have on the 2D array.

How can this be accomplished? And are dictionaries the best "tool" in this case? Thank you guys!

PS: I tried my own way but my code is incomplete, wrong, and I cannot get my head around it:

a=... #The 2D array which contains the cell values ranging 0.0 to 1.0
neigh=numpy.zeros(4)
mean_neigh=numpy.zeros(10000/4)
for k in range(len(neigh)):
    for i in a.shape[0]:
        for j in a.shape[1]:
            neigh[k]=a[i][j]
            ...

I don't know if this can help but at least I can tell that what you describe is commonly known as a stencil algorithm which is some sort of 2D finite impulse response filter. If you have `S` samples, and given your pattern, you should consider traversing the matrix from 1 to S (not from 0 to S) and apply whatever operation you describe. — Emilien, Oct 19 '15 at 14:47

score 1 · Accepted Answer · 2015-10-19T15:28:13.550

1

Well, dictionaries may in fact be the way in your case.

Are you sure that the numpy.array format you're using is correct? I don't find any array((int, int)) form in the API. anyway...

What to do once you have your 2D array declared

To make things ordered, let's make two functions that will work with any square 2D array, returning the two dictionaries that you need:

#this is the one that returns the first dictionary
def dictionarize1(array):
    dict1 = {}
    count = 0
    for x in range(len(array[0]) - 1) :
        for y in range(len(array[0]) - 1):
            dict1[count] = [array[x][y], array[x][y+1], array[x+1][y], array[x + 1][y+1]]
            count = count + 1
    return dict1

def dictionarize2(array):
    dict2 = {}
    counter = 0
    for a in range(len(array[0]) - 1) :
        for b in range(len(array[0]) - 1):
            dict2[counter] = (array[a][b] + array[a][b+1] + array[a+1][b] + array[a + 1][b+1])/4
            counter = counter + 1
    return dict2

#here's a little trial code to see them working

eighties = [[2.0, 2.2, 2.6, 5.7, 4.7], [2.1, 2.3, 2.3, 5.8, 1.6], [2.0, 2.2, 2.6, 5.7, 4.7],[2.0, 2.2, 2.6, 5.7, 4.7],[2.0, 2.2, 2.6, 5.7, 4.7]]
print("Dictionarize1: \n")
print(dictionarize1(eighties))
print("\n\n")
print("Dictionarize2: \n")
print(dictionarize2(eighties))
print("\n\n")

Compared to the first code, i prefered using an integer as a key cause python will print the dictionary sorted in that case (dictionaries are by definition unsorted, but if they have int keys Python will print them out sorted by key). However, you can change it back to a string just using str(count) as I did before.

I hope this will help, now I'm not very practical with math libraries, but the code that I wrote should work well with any 2D square array that you may want to put as an input!

edited Oct 19 '15 at 15:28

answered Oct 19 '15 at 15:08

You are absolutely right about the array format. I have in fact amended my code block, even though it is useless since it does not mention dicts... – FaCoffee Oct 19 '15 at 15:16
1

I've made one last fix, the trial code too should work now. Anyway, if you find a way to declare the array that you want this code should work. – Oct 19 '15 at 15:31
The `a` array comes, in reality, from a `numpy.genfromtxt(directoryPath, delimiter=",")` line. Later, it is converted into a list. – FaCoffee Oct 19 '15 at 15:34
1

So, you would put a list as an input or an array? – Oct 19 '15 at 15:45
The `numpy` array is converted into a `list` and it is the list that will be processed. I thought this was the best way to get this job done. I wonder what the speed of these two pieces of code would be with a `10201x1` array... – FaCoffee Oct 19 '15 at 15:48
1

Mmmh... can't you leave it an array of arrays (which is the way 2D arrays work)? That would be faster than lists if I don't go wrong, and my code will work with it. Although it should work with a list of lists too, since the way of accessing their parameters is the same. – Oct 19 '15 at 16:07
Gotcha. Two silly questions: 1) Will I be able to access one of the lists that the `dictionarize1` function creates? I hope so, but since I am totally new to `dictionaries` I need to understand; 2) How can the output of `dictionarize1` show the correct number of keys? I have `10000` keys and `10201` cells to match, so **which line tells the `dict` to "count 0 to 9999"**? – FaCoffee Oct 19 '15 at 16:16
How can the output of `dictionarize1` show the correct number of keys? I have 10000 keys and 10201 cells to match, so which line tells the dict to "count 0 to 9999"? – FaCoffee Oct 26 '15 at 14:51
1

It's quite simple: dictionaries are just lists that can have anything you want as a key, this means that you can, for example, create a dictionary that uses animals as keys and the number of their paws as a value: dict = {} – Nov 02 '15 at 23:22
1

Idk why it doesn't indent, i hope you understand the same: dict = {} dict['cat'] = 4 dict['bird'] = 2 dict['spider'] = 8 So, in order to access the elements of the dictionary that the dictionarize functions create, you just have to say (in reference to the sample code i posted above): simba = dictionarize1(eighties) number = simba[x] Where x can be any number between 0 and (rows - 1)*(columns - 1). – Nov 02 '15 at 23:32

toine · Answer 2 · 2015-10-19T16:02:57.950

1

Let's say data is the original numpy.array with dimension dr and dc for rows and columns.

dr = data.shape[0]
dc = data.shape[1]

You could produce Keys as a function that return indices of interest and Values as a list with computed mean of 4 neighbouring cells. In that case, Keys is equal to:

def Keys(x):
    xmod = x + (x+1)/dc  # dc is in scope
    return [xmod, xmod + 1, xmod + dc, xmod + 1 + dc]

The dimension of Values is equal to dr-1 * dc-1 since the last row and column is not included. We can compute it as a moving average and reshape to 1D later, (inspiration from link):

Values = ((d[:-1,:-1] + d[1:,:-1] + d[:-1,1:] + d[1:,1:])/4).reshape((dr-1)*(dc-1))

Example:

dr = 3
dc = 5

In: np.array(range(dc*dr)).reshape((dr, dc))  # data
Out: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In: [Keys(x) for x in range((dr-1)*(dc-1))]
Out: 
    [[0, 1, 5, 6],
     [1, 2, 6, 7],
     [2, 3, 7, 8],
     [3, 4, 8, 9],
     [5, 6, 10, 11],
     [6, 7, 11, 12],
     [7, 8, 12, 13],
     [8, 9, 13, 14]]

In: Values
Out: array([ 3,  4,  5,  6,  8,  9, 10, 11])

edited Oct 19 '15 at 16:02

answered Oct 19 '15 at 15:14

toine

1,946
18
24

1

if your array is huge, having `Keys` as a function saves you a lot of memory, and using `numpy` slicing and vectorization saves you time. – toine Oct 19 '15 at 15:21
1

Well, it looks very nice, although it provides some wrapping if I am not wrong. The fifth line in your matrix output must read `[5,6,10,11]` instead of `[4,5,9,10]`. In my model no wrapping is allowed. Sorry for not having pointed this out earlier. – FaCoffee Oct 19 '15 at 15:21
1

correct, I changed the code - was an error in the integer division in function Keys line 1 – toine Oct 19 '15 at 15:25
My original array is made of `10201` values, so it is rather huge. But these lines sound a bit hard for me to understand, being a newbie. `Values` is not part of the `Keys` function, right? Also, I would like to have a dictionary as output, rather than a list. This will make me read the outcomes easier. – FaCoffee Oct 19 '15 at 15:31
1

for the cell of interest `X`, you get the `mean` value from `Values[X]` and the indices from which this `mean` was built from `Keys(X)`. It is no dictionaries but I don't think you need it. – toine Oct 19 '15 at 15:37
I see your point. Anyhow, there are other dictionaries involved in the bigger picture, and they are useful as: 1) they associate the `[0,1,5,6]` neighborhood with the intersection ID (in this case, `0`); 2) Each intersection ID is associated with a value that comes from another computation, and I must be able to link the two; 3) another dictionary will be produced, matching the intersection ID with the avg value of the 4 neighboring cells. Therefore, dictionaries look catchy, as working with arrays will surely get me confused while analyzing data. – FaCoffee Oct 19 '15 at 15:47
1

Keys as a function does the same as if it was a dictionary only difference you call it with `()`. As for values, it does 3), and works the same as a dictionary. – toine Oct 19 '15 at 15:53

How to sample a huge 2D array in Python using 2x2 arrays to create a dictionary? (Stencil Algorithm for Python)

2 Answers2