How to fill NA values in raster with nearest neighbor based on categorical raster data in R?

Question

I am using geological data that comes as shapefiles. However, I need them as rasters. Therefore, I convert them to categorical rasters using Terra. The data is filled with NAs for lakes, rivers, and other open water areas. I need to fill these NA cells with the nearest neighbor category or the most often seen category in the neighboring cells. Unfortunelty, focal calculates values rather than just extending the categories. I have also tried to fillholes with Terra using the polygons, but again does not compeltley fill the gaps.

For example:

r <- raster(matrix(c(rep(1, 10), 11:16), nrow = 8, ncol = 8))
r[1,1] <- NA

The NA value should be filled with 1 because the nearest values are 1.

In my real example this would be a factor. I have been stuck on this problem for a while, so any help would be appreciated.

I have tried terra::fillholes on the polygon data and terra::focal. The fillholes did not completely close the gaps and the focal calcualted values. I attempted to select modal for the focal function but it did not fill the values with the nearest sample. I cannot do interpolations because it is categorical and there are no other rasters to predict based on.

Edit:

Using the focal function in terra with the modal option did work; however, it removes the levels in your factor. If you reassign the levels using the levels command in terra, it works great! It is just something to be aware of that the categories will be lost and the output will be an integer. I kept my original data so I could just call

 levels(new.raster) <- terra::levels(old.raster)

to get the levels.

I am not an expert, but mabye you could see [here](https://stackoverflow.com/questions/66471939/how-to-do-nearest-neighbour-interpolation-in-r-for-a-raster) or [here](https://stackoverflow.com/questions/45641168/fill-in-gaps-e-g-not-single-cells-of-na-values-in-raster-using-a-neighborhood) — Dimitri, Apr 05 '23 at 09:36
Thanks! However, focal calculates values based on a function. I already attempted to use it, but I need a simple extension of raster values (categorical) into the new areas. All the methods that I have found and tested attempt to treat the categories as values which is inappropriate. — asnead94, Apr 05 '23 at 16:59

score 1 · Answer 1 · answered Apr 05 '23 at 21:19

1

You can do that with focal (here using Andrew Chisholm's example data, and the "terra" package, the replacement of "raster")

library(terra)

set.seed(0)
r <- rast(ncol=10, nrow=10)
values(r) <- round(runif(ncell(r)) * 2) + 1
r[c(1,7,12,63)] <- NA

f <- focal(r, 3, "modal", na.policy="only")
plot(f, col=c('red', 'green', 'blue'))

answered Apr 05 '23 at 21:19

Robert Hijmans

40,301
4
55
63

Thanks so much for the help Robert. Unfortunetly, the modal option is not providing the expected result. When I compare the map before focal and after, it seems that focal with modal is just filling all the wholes with the same value even in spots were there is no way that is true. – asnead94 Apr 17 '23 at 02:22
Hey Robert, I wanted to update you. You were right! I think that there was a problem from when I rasterized my shapefile. When I looked at my levels, I got a level called that was not plotting before I did focal. When I did focal, it converted all of the values to the integer which was the highest value. – asnead94 Apr 17 '23 at 03:02

score 0 · Answer 2 · answered Apr 05 '23 at 20:06

Here's some code that might do what you want. I used dplyr to make things easier for me.

It uses adjacent and creates a data frame of the indexes of the adjacent cells to the NULLs. The value of the adjacent cells is determined and the most common adjacent to each NULL location is found. This is then copied into the raster. It's quite simplistic because there could ties when determining the most common adjacent value. Rasters wrap so adjacent cells can be on the other side of the world so this needs to be understood.

library(raster)
library(terra)
library(dplyr)

# Make some random data with some suitable NULLs
set.seed(0)
r <- raster(ncol=10, nrow=10)
values(r) <- round(runif(ncell(r)) * 2) + 1
r[1] <- NA
r[7] <- NA
r[12] <- NA
r[63] <- NA

# Find where the NULLs are
null_cells <- which(is.na(values(r)))

terra::plot(r, col=c('red', 'green', 'blue'))

# Find the adjacent cells to the NULLs and make a data frame
adjacents_df <- data.frame(terra::adjacent(r, null_cells, directions = 8, pairs=TRUE)) %>% arrange(from, to)

# Add the value of the raster cells that are adjacent to the NULLs
adjacents_df$to_value <- values(r)[adjacents_df$to]

# Work out the most common value of the adjacent cells for each NULL value
new_values <- 
    adjacents_df %>% 
    group_by(from) %>% 
    count(to_value, sort = T) %>% 
    arrange(from, .by_group = T) %>% 
    group_by(from) %>% 
    slice_head(n=1) %>% 
    select(from, to_value)

# Copy the new values into the raster
values(r)[new_values$from] = new_values$to_value

terra::plot(r, col=c('red', 'green', 'blue'))

The raster before is here

And the raster after is here

How to fill NA values in raster with nearest neighbor based on categorical raster data in R?

2 Answers2