2

I want to compute an unsupervised random forest classification out of a raster stack in R. The raster stack represents the same extent in different spectral bands and as a result I want to obtain an unsupervised classification of the stack. I am having problems with my code as my data is very huge. Is it okay to just convert the stack into a dataframe in order to run the random forest algorithm like this:

stack_median <- stack(b1_mosaic_median, b2_mosaic_median, b3_mosaic_median, b4_mosaic_median, b5_mosaic_median, b7_mosaic_median)
stack_median_df <- as.data.frame(stack_median)

Here is the data as a csv file (https://www.dropbox.com/s/gkaryusnet46f0i/stack_median_df.csv?dl=0) - and you can read it in via:

stack_median_df<-read.csv(file="stack_median_df.csv")
stack_median_df<-stack_median_df[,-1]
stack_median_df_na <- na.omit(stack_median_df)

My next step would be the unsupervised classification:

median_rf <- randomForest(stack_median_df_na, importance=TRUE, proximity=FALSE, ntree=500, type=unsupervised, forest=NULL)

Due to my huge dataset a proximity measure can't be calculated (would need around 6000GB). Do you know how to be able to have a look at the classification? As predict(median_rf) and plot(median_rf) don't return anything.

I am happy for every suggestion, improvement or code snippet of a unsupervised random forest classification with its accuracy measures,... Thanks a lot!

user2978751
  • 57
  • 1
  • 10

1 Answers1

2

I think you could use a large sample for unsupervised classification, and then use the create a supervised classification model (that predicts the classes from the raw data; and should have a very good fit) and apply that to the entire data set.

Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63