2

Hi raster warriors!

I am having a headache with my results after months of manipulating my data set. I am new with R and spatial analysis but very happy in my learning process.

Here my issue: Once I apply the zonal function to a set of 5 raster objects (.tif), somehow I get duplicated values in zones where I suppose to get zero. Also, some values appear in other cells.

Here my code:

folder <- "my_directory"

#for one country
ET<-raster("my_directory/file_ET.tif")
ED<-raster("my_directory/file_ED.tif")
SH<-raster("my_directory/file_SH.tif")
SC<-raster("my_directory/file_SC.tif") 
WH<-raster("my_directory/file_WH.tif")

#zones
lowbound<- seq(0,cellStats(ED,"max", na.rm=TRUE),1000)
upbound<-lowbound+1000
band<-upbound/1000
rclmat <- as.matrix(cbind(lowbound,upbound,band)) #to be used on reclassify
rclmat<-rclmat[1:length(rclmat[,1])-1,]
edb<- reclassify(ED, rclmat)
# zonal statistics # I already tried this: na.rm=FALSE, !anyDuplicated(sum_ET)) to avoid duplication – did not work 
sum_ET<-zonal(ET,edb,"sum") #,  na.rm=FALSE, !anyDuplicated(sum_ET))

stats<-cbind(band,upbound,sum_ET[,2], sum_SH[,2], sum_SC[,2], sum_WH[,2],perc_SH,perc_SC,perc_WH,perc_Tot)

here my results:

    upbound ET
1   1000    10523272.53
2   2000    5156046.27
3   3000    5053895.54
4   4000    4796505.3
5   5000    4392162.97
6   6000    4156065.87
31  31000   10523272.53
32  32000   5156046.27
33  33000   5053895.54
34  34000   4796505.3
35  35000   4392162.97
36  36000   4156065.87

From 31 to 36 are the same as the ones from 1 to 6.

Here results of one of my colleagues – to whom I am comparing with

            upbound ET
1   1000    10523272.53
2   2000    5156046.27
3   3000    5053895.54
4   4000    4796505.3
5   5000    4392162.97
6   6000    4156065.87
31  31000   0
32  32000   21247.26
33  33000   0
34  34000   0
35  35000   0
36  36000   23877.74

As you can see I get duplicated values.

Inputs can be found here ET, ED files

Any help is very much appreciated. Many thanks

Diego

Diego Moya
  • 123
  • 10

1 Answers1

1

First let me explain what is happening in your code on a simple example:

# Create "zone" variable
zone=1:10

# Create ET dataframe, that does not have some zones:
dET <- data.frame(zone=c(1:5,7,8), ET=runif(7, min=1, max=10)) 
dET
#   zone       ET
# 1    1 5.776128
# 2    2 9.067579
# 3    3 9.874737
# 4    4 7.846662
# 5    5 3.588964
# 6    7 3.843509
# 7    8 8.916714

#merge zone vector and the second column from dET dataframe:
# As you can see some values in the second columnd of the result are repeated to match
# the length of the zone vector and the value that originally belonged to 7th zone was moved into 6th zone
# since there was no 6th zone in the dataframe
cbind(zone, dET[,2])
#      zone         
# [1,]    1 5.776128
# [2,]    2 9.067579
# [3,]    3 9.874737
# [4,]    4 7.846662
# [5,]    5 3.588964
# [6,]    6 3.843509
# [7,]    7 8.916714
# [8,]    8 5.776128
# [9,]    9 9.067579
# [10,]   10 9.874737

#Instead you should use merge and then fill the missing values with zeros:
result <- merge(data.frame(zone=zone), dET, by="zone", all=TRUE)
result
#    zone       ET
# 1     1 5.776128
# 2     2 9.067579
# 3     3 9.874737
# 4     4 7.846662
# 5     5 3.588964
# 6     6       NA
# 7     7 3.843509
# 8     8 8.916714
# 9     9       NA
# 10   10       NA

result$ET[is.na(result$ET)] <- 0
#    zone       ET
# 1     1 5.776128
# 2     2 9.067579
# 3     3 9.874737
# 4     4 7.846662
# 5     5 3.588964
# 6     6 0.000000
# 7     7 3.843509
# 8     8 8.916714
# 9     9 0.000000
# 10   10 0.000000

Here is how you can correct your code. Here I am using only 2 of your files ED and ET. All others should be modified the same way :

library(raster)
library(rgdal)

ED<-raster("file_ED.tif")
ET<-raster("file_ET.tif")

lowbound<- seq(0,cellStats(ED,"max", na.rm=TRUE),1000)
upbound<-lowbound+1000
band<-upbound/1000
rclmat <- as.matrix(cbind(lowbound,upbound,band)) #to be used on reclassify

# Commented the following line since cellStats(ED,"max", na.rm=TRUE)=35125.57
# so you should not remove the values in the range 35K-36K
#rclmat<-rclmat[1:length(rclmat[,1])-1,]

ener_dens_band<- reclassify(ED, rclmat)
sum_ET<-zonal(ET,ener_dens_band,"sum")

# Let's look at the tail of the result:
tail(sum_ET)
#       zone      sum
# [25,]   26 27011.60
# [26,]   27 53905.17
# [27,]   28 18490.15
# [28,]   29 19322.07
# [29,]   32 21247.26
# [30,]   36 23877.74

# As you can see some zones are missing: 30, 31, 33-35
# So they need to be added
sum_ET <- merge(data.frame(zone=1:36), sum_ET, by="zone", all=TRUE)
sum_ET$sum[is.na(sum_ET$sum) ] <- 0

tail(sum_ET, n=10)
#    zone      sum
# 27   27 53905.17
# 28   28 18490.15
# 29   29 19322.07
# 30   30     0.00
# 31   31     0.00
# 32   32 21247.26
# 33   33     0.00
# 34   34     0.00
# 35   35     0.00
# 36   36 23877.74

You need to repeat the same for your other columns and then you can use cbind to put them all together into a single dataframe.

Katia
  • 3,784
  • 1
  • 14
  • 27