1

I have a cell grid of big dimensions. Each cell has an ID (p1), cell value (p3) and coordinates in actual measures (X, Y). This is how first 10 rows/cells look like

      p1     p2          p3     X  Y
0      0     0.0         0.0    0  0
1      1     0.0         0.0  100  0
2      2     0.0        12.0  200  0
3      3     0.0         0.0  300  0
4      4     0.0        70.0  400  0
5      5     0.0        40.0  500  0
6      6     0.0        20.0  600  0
7      7     0.0         0.0  700  0
8      8     0.0         0.0  800  0
9      9     0.0         0.0  900  0

Neighbouring cells of cell i in the p1 can be determined as (i-500+1, i-500-1, i-1, i+1, i+500+1, i+500-1). For example: p1 of 5 has neighbours - 4,6,504,505,506. (these are the ID of rows in the upper table - p1).

What I am trying to is: For the chosen value/row i in p1, I would like to know all neighbours in the chosen distance from i and sum all their p3 values.

I tried to apply this solution (link), but I don't know how to incorporate the distance parameter. The cell value can be taken with df.iloc, but the steps before this are a bit tricky for me.

Can you give me any advice?

EDIT: Using the solution from Thomas and having df called CO:

      p3
0     45
1    580
2  12000
3  12531
4  22456

I'd like to add another column and use the values from p3 columns

CO['new'] = format(sum_neighbors(data, CO['p3']))

But it doesn't work. If I add a number instead of a reference to row CO['p3'] it works like charm. But how can I use values from p3 column automatically in format function?

SOLVED: It worked with:

CO['new'] = CO.apply(lambda row: sum_neighbors(data, row.p3), axis=1)
energyMax
  • 419
  • 1
  • 8
  • 16
  • This question is pretty unclear. a p1 of 5 has neighbors 4,6,504,505,506 but then you haven't really given us any indication on how p3 changes with p1, because the neighbor values you displayed were relative to p1. So how do we know how p3 should change in relation to p1 – d_kennetz Oct 18 '18 at 14:27

1 Answers1

3

Solution:

import numpy as np
import pandas

# Generating toy data
N = 10
data = pandas.DataFrame({'p3': np.random.randn(N)})
print(data)

# Finding neighbours
get_candidates = lambda i: [i-500+1, i-500-1, i-1, i+1, i+500+1, i+500-1]
filter = lambda neighbors, N: [n for n in neighbors if 0<=n<N]
get_neighbors = lambda i, N: filter(get_candidates(i), N)

print("Neighbors of 5: {}".format(get_neighbors(5, len(data))))

# Summing p3 on neighbors
def sum_neighbors(data, i, col='p3'):
  return data.iloc[get_neighbors(i, len(data))][col].sum()

print("p3 sum on neighbors of 5: {}".format(sum_neighbors(data, 5)))

Output:

         p3
0 -1.106541
1 -0.760620
2  1.282252
3  0.204436
4 -1.147042
5  1.363007
6 -0.030772
7 -0.461756
8 -1.110459
9 -0.491368

Neighbors of 5: [4, 6]

p3 sum on neighbors of 5: -1.1778133703169344

Notes:

  • I assumed p1 was range(N) as seemed to be implied (so we don't need it at all).
  • I don't think that 505 is a neighbour of 5 given the list of neighbors of i defined by the OP.
  • The grid is 500 cells wide and has similar height. This is why the neighbour cells of 5 are the one as I described. Otherwise - great solution. – energyMax Oct 18 '18 at 17:14
  • But on the other hand - where would be possible to include the conditions, 1) sum values if p3 value is > 2 and 2) distance is max 1000 between cells? – energyMax Oct 18 '18 at 18:21