Emulating non-rectangular arrays

Question

Often times you want the performance of arrays over linked lists while having not conforming to the requirement of having rectangular arrays.

As an example consider an hexagonal grid, here shown with the 1-distance neighbors of cell (3, 3) in medium gray and the 2-distance neighbors in light gray. Say we want an array that contains, for each cell, the indices of every 1- and 2-distance neighbor for that cell. One slight issue is that cells have a different amount of X-distance neighbors -- cells on the grid border will have fewer neighbors than cells closer to the grid center.

(We want an array of neighbor indices --- instead of a function from cell coordinates to neighbor indices --- for performance reasons.)

We can work around this problem by keeping track of how many neighbors each cell has. Say you have an array neighbors2 of size R x C x N x 2, where R is the number of grid rows, C for columns, and N is the maximum number of 2-distance neighbors for any cell in the grid. Then, by keeping an additional array n_neighbors2 of size R x C, we can keep track of which indices in neighbors2 are populated and which are just zero padding. For example, to retrieve the the 2-distance neighbors of cell (2, 5), we simply index into the array as such:

someNeigh = neighbors2[2, 5, 0..n_neighbors2[2, 5], ..]

someNeigh will be a n_neighbors2[2, 5] x 2 array (or view) of indicies, where someNeigh[0, 0] yields the row of the first neighbor, and someNeigh[0, 1] yields the column of the first neighbor and so forth. Note that the elements at the positions

neighbors2[2, 5, n_neighbors2[2, 5]+1.., ..]

are irrelevant; this space is just padding to keep the matrix rectangular.

Provided we have a function for finding the d-distance neighbors for any cell:

import           Data.Bits (shift)
rows, cols = (7, 7)
type Cell = (Int, Int)

generateNeighs :: Int -> Cell -> [Cell]
generateNeighs d cell1 = [ (row2, col2)
                            | row2 <- [0..rows-1]
                            , col2 <- [0..cols-1]
                            , hexDistance cell1 (row2, col2) == d]

hexDistance :: Cell -> Cell -> Int
hexDistance (r1, c1) (r2, c2) = shift (abs rd + abs (rd + cd) + abs cd) (-1)
  where
    rd = r1 - r2
    cd = c1 - c2

How can we create the aforementioned arrays neighbors2 and n_neighbors2? Assume we know the maximum amount of 2-distance neighbors N beforehand. Then it is possible to modify generateNeighs to always return lists of the same size, as we can fill up remaining entries with (0, 0). That leaves, as I see it, two problems:

We need a function to populate neighbors2 which operates not every individual index but on a slice, in our case it should fill one cell at a time.
n_neighbors2 should be populated simultaneously as neighbors2

A solution is welcome with either repa or accelerate arrays.

I didn't read through your question, sorry, but just looking at your image the arrays are perfectly rectangular, you just have a bit special neighbors relationship. looking at your image [skewed 30 degrees to the right](https://imgur.com/a/Vcq8dG1) it is easier to see how that relationship could be defined following the 6 line segments around the chosen center tile. Apologies if you already knew all this and wrote about it in the question (that I skipped over). — Will Ness, Sep 16 '18 at 14:34
@WillNess The question is not to store the grid as an array, but for each cell, a 'list' of its neighbors. That won't be rectangular because the cells have a different amount of neighbors. — tsorn, Sep 16 '18 at 15:35
why not have the grid in an array and just access the neighbors through the specialized indexing scheme? it'll still be O(1) per element. is the grid huge and you only need few of such neighborhoods, is that it? otherwise, why not? — Will Ness, Sep 16 '18 at 16:50
I'm almost certain that using an array of neighbor indices will be *slower* than calculating them on the fly, unless the grid is small. CPU cache is a tightly limited and extremely valuable resource; using some of it to hold values that can be computed in a couple cheap instructions doesn't sound good. — dfeuer, Sep 16 '18 at 19:12
@willness Developing the index scheme is a lot of work for the particular coordinate system (axial) which allows for storing the grid in an array without wasting space (the latter property is important to me for unrelated reasons) — tsorn, Sep 16 '18 at 20:16
@dfeuer Using the code in the post, generating the neighbors for a cell is O(n) but it is possible to do in constant time thus your solution is probably best. Writing the code to do it in O(n) is far less work than doing it in O(1) because the neighbor offsets differ between odd and even columns.That said I still would like a solution to the actual question in this post; emulating jagged arrays (and the creation of them) is a common problem in software development. — tsorn, Sep 16 '18 at 20:16
@willness (and even ignoring that, I would curious to know how to carry on the solution to the problem as stated) — tsorn, Sep 16 '18 at 20:23
I'd recommend you consider alternatives to the monolithic-array-with-some-dummy-entries approach. That's a very Matlab thing to do. I'd consider just lining up all the neighbours of all cells in one 1D unboxed vector, and then for each cell storing just an offset into that vector and the number of neighbours. This is much nicer to construct in a pure-functional manner, and it would also need less total memory, if the number of neighbours varies considerably. — leftaroundabout, Sep 16 '18 at 20:37
the indexing is very simple. you can derive it just by looking at the skewed image. then certainly generating the neighbors will take O(1) time per neighbor. you haven't answered about the comparative sizes of all cells vs the number of neighborhoods. — Will Ness, Sep 16 '18 at 20:47
@willness The size of the grid in the picture is the one that I will actually be using, 7x7, and all cells are equally interesting. — tsorn, Sep 16 '18 at 20:49
so there's no problem with using what I proposed. you have simple regular 2D array. each time you need to find neighbors for a cell, generate their indices according to the six green segments in the second picture, skip the invalid indices, access the array in O(1) time per neighbor. — Will Ness, Sep 16 '18 at 20:51
@willness No there is not. I'm not aware of any formula/equation for the set of n-distance neighbors for arbitrary `n`. Coding the offsets by hand is a lot of work since you must do it for every distance `n` that you need, twice, since the offsets differ for odd and even columns. — tsorn, Sep 16 '18 at 20:55
@willness Could you make an answer post. I don't know which second picture you're referring to (there's only 1 in the post) nor do I grok your code. By `filterIsvalid` I assume you mean filtering by grid bounds. I don't get the directions you're referring to; how n-distance neighbors are traversed for a given cell and arbitrary n. — tsorn, Sep 16 '18 at 21:46
Looks like a bunch of cubes to me, is it easier to approach the problem in a 3D coordinate system? — dopamane, Sep 17 '18 at 13:39
@davidcox That leads to wasted space when storing the grid in a rectangular array. See https://www.redblobgames.com/grids/hexagons/ for an excellent resource on hexagonal grids. — tsorn, Sep 17 '18 at 14:56

Will Ness · Accepted Answer · 2018-09-16T22:14:47.167

Here's you picture skewed 30 degrees to the right:

As you can see your array is actually perfectly rectangular.

The indices of a neighborhood's periphery are easily found as six straight pieces around the chosen center cell, e.g. (imagine n == 2 is the distance of the periphery from the center (i,j) == (3,3) in the picture):

periphery n (i,j) = 
   --     2 (3,3)
  let 
    ((i1,j1):ps1) = reverse . take (n+1) . iterate (\(i,j)->(i,j+1)) $ (i-n,j) 
    --                                                                 ( 1, 3)
    ((i2,j2):ps2) = reverse . take (n+1) . iterate (\(i,j)->(i+1,j)) $ (i1,j1) 
    --                                                                 ( 1, 5)
    .....
    ps6           = .......                                          $ (i5,j5)
  in filter isValid (ps6 ++ ... ++ ps2 ++ ps1)

The whole neighborhood is simply

neighborhood n (i,j) = (i,j) : concat [ periphery k (i,j) | k <- [1..n] ]

For each cell/distance combination, simply generate the neighborhood indices on the fly and access your array in O(1) time for each index pair.

tsorn · Answer 2 · 2018-09-18T10:45:55.817

Writing out the answer from @WillNess in full, and incorporating the proposal from @leftroundabout to store indecies in a 1D vector instead, and we get this:

import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate (Acc, Array, DIM1, DIM2, DIM3, Z(..), (:.)(..), (!), fromList, use)

rows = 7
cols = 7

type Cell = (Int, Int)

(neighs, nNeighs) = generateNeighs

-- Return a vector of indices of cells at distance 'd' or less from the given cell
getNeighs :: Int -> Cell -> Acc (Array DIM1 Cell)
getNeighs d (r,c) = A.take n $ A.drop start neighs
  where
    start = nNeighs ! A.constant (Z :. r :. c :. 0)
    n = nNeighs ! A.constant (Z :. r :. c :. d)

generateNeighs :: (Acc (Array DIM1 Cell), Acc (Array DIM3 Int))
generateNeighs = (neighsArr, nNeighsArr)
  where
    idxs = concat [[(r, c) | c <- [0..cols-1]] | r <- [0..rows-1]]
    (neighsLi, nNeighsLi, n) = foldl inner ([], [], 0) idxs
    neighsArr = use $ fromList (Z :. n) neighsLi
    nNeighsArr = use $ fromList (Z :. rows :. cols :. 5) nNeighsLi
    inner (neighs', nNeighs', n') idx = (neighs' ++ cellNeighs, nNeighs'', n'')
      where
        (cellNeighs, cellNNeighs) = neighborhood idx
        n'' = n' + length cellNeighs
        nNeighs'' = nNeighs' ++ n' : cellNNeighs

neighborhood :: Cell -> ([Cell], [Int])
neighborhood (r,c) = (neighs, nNeighs)
  where
    neighsO = [ periphery d (r,c) | d <- [1..4] ]
    neighs = (r,c) : concat neighsO
    nNeighs = tail $ scanl (+) 1 $ map length neighsO

periphery d (r,c) =
  -- The set of d-distance neighbors form a hexagon shape. Traverse each of
  -- the sides of this hexagon and gather up the cell indices.
  let 
    ps1 = take d . iterate (\(r,c)->(r,c+1))   $ (r-d,c)
    ps2 = take d . iterate (\(r,c)->(r+1,c))   $ (r-d,c+d)
    ps3 = take d . iterate (\(r,c)->(r+1,c-1)) $ (r,c+d)
    ps4 = take d . iterate (\(r,c)->(r,c-1))   $ (r+d,c)
    ps5 = take d . iterate (\(r,c)->(r-1,c))   $ (r+d,c-d)
    ps6 = take d . iterate (\(r,c)->(r-1,c+1)) $ (r,c-d)
  in filter isValid (ps6 ++ ps5 ++ ps4 ++ ps3 ++ ps2 ++ ps1)


isValid :: Cell -> Bool
isValid (r, c)
  | r < 0 || r >= rows = False
  | c < 0 || c >= cols = False
  | otherwise = True

BTW the reason I used `reverse` was that I didn't want to call `last` to find the next starting point which is the next vertex of the fence polygon. But if we calculate each of the six vertices of the polygon explicitly (which isn't hard; I was just being lazy with that), we don't need all that, and can eliminate the reverses, and also convert the `take...iterate` to simple enumerations in the style of `zip [i,i+1..i+d-1] (repeat j)`. :) Also seems the d==1 case should be special cased as well (there's no enumerations, just the vertices). — Will Ness, Sep 17 '18 at 17:03
@WillNess I incorporated your first suggestion in the post. I also benchmarked that code (with suggestion #1) versus also using `zip`s and `repeat`s. It turns out the latter approach is about 50 % slower according to Criterion (testing code: https://github.com/tsoernes/haskelldca/blob/master/test/TestGridneighs.hs) — tsorn, Sep 18 '18 at 11:42
you're right, I later thought about this too (that there's no real need to special case d==1) but wasn't on-line to comment. so `iterate` compiles into a faster code than enumerations, interesting. Ah, indeed, `zip` is not a "good producer" (i.e. it doesn't fuse) but `iterate` is; so this makes sense. Did you make sure to compile with the -O2 switch BTW? — Will Ness, Sep 18 '18 at 13:59

score 0 · Answer 3 · answered Sep 16 '18 at 20:47

This can be by using the permute function to fill the neighbors for 1 cell at a time.

import Data.Bits (shift)
import Data.Array.Accelerate as A
import qualified Prelude as P
import Prelude hiding ((++), (==))

rows = 7
cols = 7
channels = 70

type Cell = (Int, Int)

(neighs, nNeighs) = fillNeighs

getNeighs :: Cell -> Acc (Array DIM1 Cell)
getNeighs (r, c) = A.take (nNeighs ! sh1) $ slice neighs sh2
  where
    sh1 = constant (Z :. r :. c)
    sh2 = constant (Z :. r :. c :. All)

fillNeighs :: (Acc (Array DIM3 Cell), Acc (Array DIM2 Int))
fillNeighs = (neighs2, nNeighs2)
  where
    sh = constant (Z :. rows :. cols :. 18) :: Exp DIM3
    neighZeros = fill sh (lift (0 :: Int, 0 :: Int)) :: Acc (Array DIM3 Cell)
    -- nNeighZeros = fill (constant (Z :. rows :. cols)) 0 :: Acc (Array DIM2 Int)
    (neighs2, nNeighs2li) = foldr inner (neighZeros, []) indices
    nNeighs2 = use $ fromList (Z :. rows :. cols) nNeighs2li
    -- Generate indices by varying column fastest. This assures that fromList, which fills
    -- the array in row-major order, gets nNeighs in the correct order.
    indices = foldr (\r acc -> foldr (\c acc2 -> (r, c):acc2 ) acc [0..cols-1]) [] [0..rows-1]
    inner :: Cell
      -> (Acc (Array DIM3 Cell), [Int])
      -> (Acc (Array DIM3 Cell), [Int])
    inner cell (neighs, nNeighs) = (newNeighs, n : nNeighs)
      where
        (newNeighs, n) = fillCell cell neighs


-- Given an cell and a 3D array to contain cell neighbors,
-- fill in the neighbors for the given cell
-- and return the number of neighbors filled in
fillCell :: Cell -> Acc (Array DIM3 Cell) -> (Acc (Array DIM3 Cell), Int)
fillCell (r, c) arr = (permute const arr indcomb neighs2arr, nNeighs)
  where
    (ra, ca) = (lift r, lift c) :: (Exp Int, Exp Int)
    neighs2li = generateNeighs 2 (r, c)
    nNeighs = P.length neighs2li
    neighs2arr = use $ fromList (Z :. nNeighs) neighs2li
    -- Traverse the 3rd dimension of the given cell
    indcomb :: Exp DIM1 -> Exp DIM3
    indcomb nsh = index3 ra ca (unindex1 nsh)


generateNeighs :: Int -> Cell -> [Cell]
generateNeighs d cell1 = [ (row2, col2)
                            | row2 <- [0..rows]
                            , col2 <- [0..cols]
                            , hexDistance cell1 (row2, col2) P.== d]


-- Manhattan distance between two cells in an hexagonal grid with an axial coordinate system
hexDistance :: Cell -> Cell -> Int
hexDistance (r1, c1) (r2, c2) = shift (abs rd + abs (rd + cd) + abs cd) (-1)
  where
    rd = r1 - r2
    cd = c1 - c2

Emulating non-rectangular arrays

3 Answers3