Ok, I finally found some time to do this. It's not the too elegant, but should work.
At first, load the libraries and the function below (you'll need to install data.table
package)
library(data.table)
library(party)
WeightFunc <- function(data, DV){
# Creating some paste function in order to paste unique paths
paste2 <- function(x) paste(x, collapse = ",")
ignore <- DV
# Creating unique paths
test3 <- apply(data[setdiff(names(data),ignore)], 1, paste2)
# Binding the unique paths vector back to the original data
data <- cbind(data, test3)
#data
# Getting the values of each explaining variable per each unique path
dt <- data.table(data[setdiff(names(data), ignore)])
dt.out <- as.data.frame(dt[, head(.SD, 1), by = test3])
# Creating dummy variables per each value of our dependable variable for further calculations
DVLvs <- as.character(unique(data[, DV]))
data[, DVLvs[1]] <- ifelse(data[, DV] == DVLvs[1], 1, 0)
data[, DVLvs[2]] <- ifelse(data[, DV] == DVLvs[2], 1, 0)
data[, DVLvs[3]] <- ifelse(data[, DV] == DVLvs[3], 1, 0)
# Summing dummy variables per unique path
dt <- data.table(data[c("test3", DVLvs)])
dt.out2 <- as.data.frame(dt[, lapply(.SD, sum), by = test3])
# Binding unique pathes with sums
dt.out2$test3 <- dt.out$test3 <- NULL
test <- cbind(dt.out, dt.out2)
# Duplicating the data in order to create a weights for every level of expalined variable
test2 <- test[rep(1:nrow(test),each = 3), ]
test2 <- cbind(test2, AdjDV = DVLvs)
test2$Weights <- ifelse(is.element(seq(1:nrow(test2)), grep("[.]1", rownames(test2))), test2[, DVLvs[2]],
ifelse(is.element(seq(1:nrow(test2)), grep("[.]2",rownames(test2))), test2[, DVLvs[3]], test2[, DVLvs[1]]))
# Deleting unseassery column
test2[, DVLvs[1]] <- test2[, DVLvs[2]] <- test2[, DVLvs[3]] <- NULL
return(test2)
}
Now run this function on your data set where data
is your data and DV
is your explained variable name (in quotes) and save it in a new dataset, for example:
Newdata <- WeightFunc(data = Mydata, DV = "Success")
Now, this process could take a while if you have many unique pathes, but it shouldn't overload your memory. If you don't have too many unique paths, this function should reduce your data set by tens or even hundred times. Also, this function is only good for 3 level factor explained variable (like you have).
After that, you can run the ctree
as you were doing previously, but with the new data and the new explained variable (which will be called AdjDV
) and wiegths parameter which called Weights
. You'll also have to exclude Weights
out of the dataset while running the ctree
.
Like that:
ct <- ctree(AdjDV ~., data = Newdata[setdiff(names(Newdata), "Weights")], weights = Newdata$Weights)