I am trying to run a least discriminant analysis (lda()
) on a data.frame I created by dividing several variables by an additional scaling variable (not shown here) in R using the MASS
package. Below is a sample dataset and a sample version of the code I am using that reproduces the error.
class Var1 Var2 Var3 Var4
2 0.732459522 0.973014649 0.612952968 0.127216654
3 0.76692254 0.990230286 0.629448709 0.104675506
2 0.847487002 1.021663778 0.649046794 0.187175043
3 0.823583181 1.050274223 0.673674589 0.170018282
1 0.796279894 1.058458813 0.583702391 0.222320638
2 0.925681255 1.009909166 0.636663914 0.205615194
2 0.627334465 1.074702886 0.59762309 0.23344652
3 0.980376124 1.011447261 0.646770237 0.232215863
3 0.79342723 1.048826291 0.750234742 0.248826291
1 0.960655738 1.042622951 0.6 0.262295082
2 0.963788301 1.005571031 0.590529248 0.233983287
1 1.013157895 1.049342105 0.657894737 0.223684211
2 1.211538462 1.060897436 0.733974359 0.288461538
3 1.25083612 1.023411371 0.759197324 0.311036789
3 0.959196485 1.009416196 0.635907094 0.12868801
1 0.823681936 1.005185825 0.590319793 0.219533276
2 0.777508091 0.998381877 0.624595469 0.165048544
3 0.749114103 0.985825656 0.585400425 0.133947555
1 0.816999133 1.036426713 0.604509974 0.197745013
data<-read.csv("data.csv",header=TRUE)
data_train<-na.omit(data)
scores_train<-data_train[-c(1)]
lda_train<-lda(data_train$class~scores_train,prior = c(1,1,1)/3,CV=TRUE)
scores_test<-data[-c(1)]
lda_test<-predict(lda_train,as.data.frame(scores_test),prior = c(1,1,1)/3)
lda_train<-lda(data_train$class~as.matrix(scores_train),prior = c(1,1,1)/3,CV=TRUE)
class(scores_train)
class(scores_test)
When I try to perform the lda using the dataset, I get the following error message.
Error in model.frame.default(formula = data_train$class ~ scores_train) :
invalid type (list) for variable 'scores_train'
I am able to coerce the data into working by coercing it into a matrix format using as.matrix
. Notably, trying to do something similar using as.data.frame()
and data.frame()
does not work. However then when I try to apply the resulting discriminant function to the total dataset the I get the following message...
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "list"
However, when I check the class of the objects of using class()
, it says both objects are in a data.frame format. I checked the dataset to see if there were any incomplete rows or columns that could cause it to treat them as a series of lists instead of a single data.frame, but there are no missing values. Similarly, it does not appear to be due to the names of any variables.
I am not sure why R is treating the object as a list instead of a data.frame (and thereby causing the least discriminant analysis to fail), especially as it recognizes the objects are of the class data.frame.