0

I have a dataframe df with a point each 0.1 unit:

df <- expand.grid(x = seq(0, 20, by = .1),
                  y = seq(0, 20, by = .1))

I defined a new dataframe grid which has a point each 4 units:

grid <- expand.grid(xg = seq(0, 20, by = 4),
                    yg = seq(0, 20, by = 4))

I would like to use the points of grid as nodes of a grid and determine the points in df which fall inside its cells. The information about the grid cell should be added to a new column in df providing a string such as i.j for each point, where i and j are the row and column index of the grid cell, respectively. The new column should report NA for df points on the grid lines.

For example, all df points with 0 < x < 4 and 0 < y < 4 should be labeled as 1.1, whereas points with 8 < x < 12 and 16 < y < 20 should be labeled as 3.5 and so on.

The ideal solution should be fine also for grids with different size, i.e. expand.grid(xg = seq(0, 20, by = 2), yg = seq(0, 20, by = 2).

Thanks for your help.

Ndr
  • 550
  • 3
  • 15

1 Answers1

0

This is a bit hacky, but you could create the i.j index in grid then join this to df and pad the NAs for each grid chunk:

df <- expand.grid(x = seq(0, 20, by = .1),
                  y = seq(0, 20, by = .1))
head(df)
#>     x y
#> 1 0.0 0
#> 2 0.1 0
#> 3 0.2 0
#> 4 0.3 0
#> 5 0.4 0
#> 6 0.5 0

grid <- expand.grid(xg = seq(0, 20, by = 4),
                    yg = seq(0, 20, by = 4))
head(grid)
#>   xg yg
#> 1  0  0
#> 2  4  0
#> 3  8  0
#> 4 12  0
#> 5 16  0
#> 6 20  0

# Make row/col indices
grid$i <- as.integer(factor(grid$xg))
grid$j <- as.integer(factor(grid$yg))
grid$i.j <- paste(grid$i, grid$j, sep = '.')
grid$i <- NULL
grid$j <- NULL

# Merge indices to df
indexed <- merge(df, grid, by.x = c('x', 'y'), by.y = c('xg', 'yg'), all = TRUE)
head(indexed)
#>   x   y i.j
#> 1 0 0.0 1.1
#> 2 0 0.1  NA
#> 3 0 0.2  NA
#> 4 0 0.3  NA
#> 5 0 0.4  NA
#> 6 0 0.5  NA

# Fill in betweens
for (i in 2:nrow(indexed)) {
  if (is.na(indexed$i.j[i])) {
    indexed$i.j[i] <- indexed$i.j[i - 1]
  }
}
head(indexed)
#>   x   y i.j
#> 1 0 0.0 1.1
#> 2 0 0.1 1.1
#> 3 0 0.2 1.1
#> 4 0 0.3 1.1
#> 5 0 0.4 1.1
#> 6 0 0.5 1.1

This would only work if the df sequences intersect the grid sequences, i.e. df[8081, ] = {4.0, 4.0} is in grid and df[1, ] is also in grid.

Also, the for loop is pretty slow. You could try converting it to an Rcpp loop, or maybe there's a more efficient way of non-equi joining with {data.table} or {sqldf}

rdh
  • 1,035
  • 7
  • 11