I'm using Databricks with the SparkR package to build a glm model. Everything seems to run ok except when I run summary(lm1)
. Instead of getting Variable, Estimate, Std.Error, t-value & p-value (see pic below - this is what I'd expect to see, NOT what I'm getting), I just get the variable and estimate. The only thing I can think is that the data set is big enough (train1 is 12 million rows and test1 is 6 million rows) that all estimates have 0 p-values. Any other reasons this would happen??
library(SparkR)
rdf <- sql("select * from myTable") #read data
train1 <- rdf[rdf$ntile_3 != 1,] # split into test and train based on ntile in table
test1 <- rdf[rdf$ntile_3 == 1,]
vtu1 <- c('var1','var2','var3')
lm1 <- glm( target ~., train1[,c(vtu1,'target' )],family = 'gaussian')
pred1 <- predict(lm1, test1)
summary(lm1)