My goal is to conduct a random forest classification for agricultural landuse forms (crop classification). I have several ground truth points for all classes. Furthermore, I have 37 raster files (.tif) each having the same 12 bands and same extent, with one file representing one date in the time series. The time series is NOT constant.
The following shows the files, the dates and band names plus and first file read with terra:
> files <- list.files("C:/temp/final2",full.names = T,pattern = ".tif$",recursive = T)
> files[1:3]
[1] "C:/temp/final2/20190322T100031_20190322T100524_T33UXP.tif" "C:/temp/final2/20190324T095029_20190324T095522_T33UXP.tif"
[3] "C:/temp/final2/20190329T095031_20190329T095315_T33UXP.tif"
> dates <- as.Date(substr(basename(files),1,8),"%Y%m%d")
> band_names <- c("B02","B03","B04","B05","B08","B11","B12","NDVI","NDWI","SAVI")
> rast(files[1])
class : SpatRaster
dimensions : 386, 695, 12 (nrow, ncol, nlyr)
resolution : 10, 10 (x, y)
extent : 634500, 641450, 5342460, 5346320 (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 33N (EPSG:32633)
source : 20190322T100031_20190322T100524_T33UXP.tif
names : B2, B3, B4, B5, B6, B7, ...
I want to extract the value for every date and band. This should result in a dataframe with oberserved variables and the respective class for each point (see below). With this dataframe I want to train a random forest model in order to predict the crop classes for each raster (resulting in a single raster layer with classes as values).
The following structure (copied from https://gdalcubes.github.io/source/tutorials/vignettes/gc03_ML_training_data.html) is what I need as observed values, which serve as the training data for the rf model.
## FID time B2 ... more bands ... and class of respective FID
## 1 16 2018-01-01 13.33
## 2 17 2018-01-01 13.63
## 3 18 2018-01-01 13.33
## 4 19 2018-01-01 12.15
## 5 20 2018-01-01 14.73
## 6 21 2018-01-01 15.91
## 7 16 2018-01-09 12.23
## 8 17 2018-01-09 12.15
## 9 18 2018-01-09 12.07
## 10 19 2018-01-09 10.19
## 11 20 2018-01-09 9.83
I (1) read all the rasters into list called 'cube' and
(2) combined all the spatRasters in the list into one spatRaster.
> cube <- c()
> for (file in files){
+ ras <- rast(file)
+ cube<-c(cube,ras)
+ }
> names(cube) <- dates
> cubef <- rast(cube)
> cubef
class : SpatRaster
dimensions : 386, 695, 444 (nrow, ncol, nlyr)
resolution : 10, 10 (x, y)
extent : 634500, 641450, 5342460, 5346320 (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 33N (EPSG:32633)
sources : 20190322T100031_20190322T100524_T33UXP.tif (12 layers)
20190324T095029_20190324T095522_T33UXP.tif (12 layers)
20190329T095031_20190329T095315_T33UXP.tif (12 layers)
... and 34 more source(s)
names : 2019-03-22_1, 2019-03-22_2, 2019-03-22_3, 2019-03-22_4, 2019-03-22_5, 2019-03-22_6, ...
When I extract the values of all the layers for sample points, I get the follwing result.
> s_points <- st_read(connex,query="SELECT * FROM s_points WHERE NOT ST_IsEmpty(geom);")
> str(s_points)
Classes ‘sf’ and 'data.frame': 286 obs. of 3 variables:
$ s_point_id: int 1 1 2 2 4 4 6 6 7 7 ...
$ kf_klasse : chr "ERBSEN - GETREIDE GEMENGE" "ERBSEN - GETREIDE GEMENGE" "ERBSEN - GETREIDE GEMENGE" "ERBSEN - GETREIDE GEMENGE" ...
$ geom :sfc_POINT of length 286; first list element: 'XY' num 637052 5345218
- attr(*, "sf_column")= chr "geom"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
..- attr(*, "names")= chr [1:2] "s_point_id" "kf_klasse"
> s_points_coords <- st_coordinates(s_points)
> e <- terra::extract(cubef, s_points)
> str(e)
'data.frame': 286 obs. of 445 variables:
$ ID : num 1 1 2 2 3 3 4 4 5 5 ...
$ 2019-03-22_1 : num 0.0789 0.0901 0.0587 0.063 0.0937 0.0901 0.0517 0.0528 0.0819 0.0882 ...
$ 2019-03-22_2 : num 0.096 0.1056 0.0728 0.0771 0.1072 ...
$ 2019-03-22_3 : num 0.108 0.1226 0.0734 0.0788 0.125 ...
$ 2019-03-22_4 : num 0.1301 0.1437 0.0998 0.1017 0.1395 ...
$ 2019-03-22_5 : num 0.166 0.174 0.157 0.151 0.156 ...
$ 2019-03-22_6 : num 0.183 0.188 0.174 0.163 0.169 ...
$ 2019-03-22_7 : num 0.196 0.196 0.183 0.169 0.186 ...
$ 2019-03-22_8 : num 0.27 0.293 0.171 0.172 0.282 ...
$ 2019-03-22_9 : num 0.236 0.269 0.138 0.142 0.252 ...
$ 2019-03-22_10: num 0.29 0.229 0.427 0.365 0.196 ...
$ 2019-03-22_11: num -0.343 -0.299 -0.43 -0.374 -0.268 ...
$ 2019-03-22_12: num 0.1353 0.1108 0.1739 0.1452 0.0928 ...
$ 2019-03-24_1 : num 0.099 0.1088 0.0919 NA 0.1058 ...
$ 2019-03-24_2 : num 0.111 0.115 0.11 NA 0.114 ...
$ 2019-03-24_3 : num 0.116 0.127 0.104 NA 0.131 ...
$ 2019-03-24_4 : num 0.145 0.154 0.147 NA 0.152 ...
$ 2019-03-24_5 : num 0.19 0.19 0.258 NA 0.171 ...
$ 2019-03-24_6 : num 0.208 0.21 0.294 NA 0.186 ...
$ 2019-03-24_7 : num 0.231 0.222 0.31 NA 0.197 ...
$ 2019-03-24_8 : num 0.318 0.341 0.281 NA 0.331 ...
$ 2019-03-24_9 : num 0.283 0.314 0.217 NA 0.305 ...
$ 2019-03-24_10: num 0.329 0.271 0.497 NA 0.202 ...
$ 2019-03-24_11: num -0.35 -0.317 -0.477 NA -0.268 ...
$ 2019-03-24_12: num 0.1698 0.1405 0.291 NA 0.0997 ...
$ 2019-03-29_1 : num NA NA 0.0476 NA 0.0891 0.0847 0.0664 0.0719 NA NA ...
$ 2019-03-29_2 : num NA NA 0.0642 NA 0.0965 ...
$ 2019-03-29_3 : num NA NA 0.0607 NA 0.1196 ...
$ 2019-03-29_4 : num NA NA 0.0904 NA 0.1351 ...
$ 2019-03-29_5 : num NA NA 0.162 NA 0.149 ...
$ 2019-03-29_6 : num NA NA 0.18 NA 0.167 ...
$ 2019-03-29_7 : num NA NA 0.182 NA 0.183 ...
$ 2019-03-29_8 : num NA NA 0.167 NA 0.337 ...
$ 2019-03-29_9 : num NA NA 0.125 NA 0.311 ...
$ 2019-03-29_10: num NA NA 0.5 NA 0.209 ...
$ 2019-03-29_11: num NA NA -0.479 NA -0.309 ...
$ 2019-03-29_12: num NA NA 0.1955 NA 0.0971 ...
$ 2019-04-01_1 : num 0.0616 0.0703 0.0543 0.0573 0.0733 0.0783 0.0675 0.0693 0.0557 0.0584 ...
$ 2019-04-01_2 : num 0.0742 0.0838 0.073 0.076 0.0849 0.0872 0.0783 0.0821 0.0733 0.073 ...
$ 2019-04-01_3 : num 0.0798 0.0945 0.066 0.0758 0.0987 ...
$ 2019-04-01_4 : num 0.101 0.114 0.104 0.106 0.116 ...
$ 2019-04-01_5 : num 0.144 0.143 0.205 0.188 0.129 ...
$ 2019-04-01_6 : num 0.157 0.157 0.231 0.209 0.143 ...
$ 2019-04-01_7 : num 0.17 0.165 0.249 0.214 0.153 ...
$ 2019-04-01_8 : num 0.24 0.259 0.208 0.212 0.275 ...
$ 2019-04-01_9 : num 0.207 0.232 0.152 0.168 0.256 ...
$ 2019-04-01_10: num 0.362 0.272 0.581 0.476 0.216 ...
$ 2019-04-01_11: num -0.393 -0.326 -0.547 -0.475 -0.287 ...
$ 2019-04-01_12: num 0.1449 0.1119 0.2783 0.2137 0.0871 ...
$ 2019-04-16_1 : num 0.0639 0.0695 0.0539 0.0541 0.0767 0.081 0.0754 0.0739 0.0606 0.0621 ...
$ 2019-04-16_2 : num 0.0733 0.0797 0.0717 0.07 0.0834 0.0862 0.0835 0.0854 0.0748 0.0785 ...
$ 2019-04-16_3 : num 0.0832 0.0923 0.0658 0.0626 0.1042 ...
$ 2019-04-16_4 : num 0.108 0.115 0.111 0.107 0.118 ...
$ 2019-04-16_5 : num 0.164 0.159 0.229 0.223 0.136 ...
$ 2019-04-16_6 : num 0.183 0.179 0.26 0.26 0.149 ...
$ 2019-04-16_7 : num 0.202 0.198 0.284 0.275 0.166 ...
$ 2019-04-16_8 : num 0.255 0.27 0.205 0.202 0.288 ...
$ 2019-04-16_9 : num 0.219 0.244 0.141 0.144 0.278 ...
$ 2019-04-16_10: num 0.416 0.364 0.623 0.63 0.23 ...
$ 2019-04-16_11: num -0.467 -0.426 -0.596 -0.595 -0.332 ...
$ 2019-04-16_12: num 0.1846 0.1638 0.3228 0.3181 0.0979 ...
$ 2019-04-18_1 : num 0.0702 0.0792 0.0636 0.063 0.0875 0.094 0.0858 0.0868 0.0662 0.0709 ...
$ 2019-04-18_2 : num 0.0838 0.0946 0.0898 0.0872 0.101 ...
$ 2019-04-18_3 : num 0.0908 0.1038 0.0785 0.0765 0.1206 ...
$ 2019-04-18_4 : num 0.121 0.13 0.13 0.125 0.138 ...
$ 2019-04-18_5 : num 0.186 0.183 0.266 0.253 0.154 ...
$ 2019-04-18_6 : num 0.213 0.205 0.299 0.289 0.167 ...
$ 2019-04-18_7 : num 0.221 0.214 0.312 0.297 0.186 ...
$ 2019-04-18_8 : num 0.275 0.294 0.228 0.228 0.314 ...
$ 2019-04-18_9 : num 0.227 0.255 0.154 0.157 0.296 ...
$ 2019-04-18_10: num 0.418 0.346 0.598 0.59 0.214 ...
$ 2019-04-18_11: num -0.45 -0.387 -0.553 -0.546 -0.297 ...
$ 2019-04-18_12: num 0.199 0.167 0.335 0.321 0.101 ...
$ 2019-04-21_1 : num 0.0404 0.0619 0.0373 0.0351 0.0814 0.0844 0.0764 0.0801 0.0563 0.0626 ...
$ 2019-04-21_2 : num 0.0592 0.0823 0.0614 0.0579 0.0927 0.0966 0.0933 0.0952 0.0776 0.0869 ...
$ 2019-04-21_3 : num 0.0542 0.0873 0.048 0.0433 0.1118 ...
$ 2019-04-21_4 : num 0.082 0.105 0.0933 0.0841 0.1279 ...
$ 2019-04-21_5 : num 0.15 0.163 0.225 0.207 0.144 ...
$ 2019-04-21_6 : num 0.173 0.184 0.259 0.247 0.155 ...
$ 2019-04-21_7 : num 0.174 0.199 0.274 0.251 0.172 ...
$ 2019-04-21_8 : num 0.192 0.237 0.168 0.156 0.291 ...
$ 2019-04-21_9 : num 0.1352 0.1804 0.0994 0.0903 0.2674 ...
$ 2019-04-21_10: num 0.525 0.391 0.702 0.706 0.213 ...
$ 2019-04-21_11: num -0.493 -0.415 -0.634 -0.625 -0.3 ...
$ 2019-04-21_12: num 0.1954 0.174 0.3422 0.3212 0.0941 ...
$ 2019-05-01_1 : num 0.0342 0.0435 0.0282 0.0292 0.07 0.0684 0.0722 0.0757 0.0458 0.061 ...
$ 2019-05-01_2 : num 0.0516 0.055 0.0517 0.048 0.0781 0.0793 0.0861 0.0919 0.0613 0.0839 ...
$ 2019-05-01_3 : num 0.0422 0.0538 0.0299 0.0325 0.0991 ...
$ 2019-05-01_4 : num 0.0753 0.0836 0.0761 0.0755 0.1112 ...
$ 2019-05-01_5 : num 0.182 0.177 0.247 0.235 0.124 ...
$ 2019-05-01_6 : num 0.21 0.203 0.3 0.287 0.138 ...
$ 2019-05-01_7 : num 0.214 0.19 0.314 0.293 0.157 ...
$ 2019-05-01_8 : num 0.164 0.182 0.148 0.146 0.264 ...
$ 2019-05-01_9 : num 0.0988 0.1156 0.0777 0.0763 0.235 ...
$ 2019-05-01_10: num 0.67 0.559 0.826 0.801 0.225 ...
$ 2019-05-01_11: num -0.611 -0.552 -0.717 -0.719 -0.334 ...
$ 2019-05-01_12: num 0.273 0.2196 0.4226 0.3935 0.0916 ...
$ 2019-05-26_1 : num 0.0537 0.0633 0.0431 0.0444 0.118 ...
$ 2019-05-26_2 : num 0.0675 0.0835 0.0611 0.0564 0.1284 ...
[list output truncated]
What I have now is a dataframe, that has a column for every band of each image (12 columns for each image), which results in 37x12 columns. From here on, I don't know how to add the extracted values to the s_points dataframe, in order to have the ID and classname of the extracted values. This isn't possible, because I have 444 values for every point.
My questions are:
- How can I combine the extracted values and the sample_points?
- How can I train a rf-model with this extracted data?
- Does it make more sense to use a datacube here (gdalcubes in R)? I forget this idea, mainly because of the unconstant character of the time series, which would result in problem with the temporal aggregation. This isn't expedient in the research question. Thanks