11

Not able to fix the below error for the below logistic regression

training=(IBM$Serial<625)
data=IBM[!training,]

stock.direction <- data$Direction
training_model=glm(stock.direction~data$lag2,data=data,family=binomial)

###Error### ---- Error in eval(family$initialize) : y values must be 0 <= y <= 1

Few rows from the data i am using

X   Date    Open    High    Low Close   Adj.Close   Volume  Return  lag1    lag2    lag3    Direction   Serial
1   28-11-2012  190.979996  192.039993  189.270004  191.979996  165.107727  3603600 0.004010855 0.004010855 -0.001198021    -0.006354834    Up  1
2   29-11-2012  192.75  192.899994  190.199997  191.529999  164.720734  4077900 0.00114865  0.00114865  -0.004020279    -0.009502386    Up  2
3   30-11-2012  191.75  192 189.5   190.070007  163.465073  4936400 0.003630178 0.003630178 -0.001894039    -0.005576956    Up  3
4   03-12-2012  190.759995  191.300003  188.360001  189.479996  162.957703  3349600 0.001213907 0.001213907 -0.002480478    -0.001636046    Up  4
user438383
  • 5,716
  • 8
  • 28
  • 43
Akhil Doppalapudi
  • 153
  • 1
  • 1
  • 4

3 Answers3

17

The reason it's asking for y values between 0 and 1 is because the categorical features in your data such as 'direction' are of type 'character'. You need to convert them to type 'factor' with as.factor(data$Direction). So: glm(Direction ~ lag2, data=...) Don't need to declare stock.direction.

You can check the class of variables by using the command class(variable), and if they're character, you can convert to factor and create a new column in the same data frame. It should work then.

smci
  • 32,567
  • 20
  • 113
  • 146
Nidhi Agarwal
  • 428
  • 2
  • 6
  • 17
  • 1
    Just reference `as.factor(data$Direction)`. So: `glm(Direction ~ lag2, data=...)` Don't need to declare stock.direction. – smci Apr 30 '18 at 22:06
4

I was getting the same error

Error in eval(family$initialize) : y values must be 0 <= y <= 1" and solved it by adding "stringsAsFactors=T

to the read.csv function.

BEFORE : gene.train = read.csv("gene.train.csv", header=T) # error

AFTER : gene.train = read.csv("gene.train.csv", header=T, stringsAsFactors=T) # no error.

user438383
  • 5,716
  • 8
  • 28
  • 43
binmosa
  • 967
  • 1
  • 7
  • 9
0

Without understanding the data, you should do st like this

library(dplyr)
df <- read.table(header = T, stringsAsFactors = F,  text ="X   Date    Open    High    Low Close   Adj.Close   Volume  Return  lag1    lag2    lag3    Direction   Serial
1   28-11-2012  190.979996  192.039993  189.270004  191.979996  165.107727  3603600 0.004010855 0.004010855 -0.001198021    -0.006354834    Up  1
2   29-11-2012  192.75  192.899994  190.199997  191.529999  164.720734  4077900 0.00114865  0.00114865  -0.004020279    -0.009502386    Up  2
3   30-11-2012  191.75  192 189.5   190.070007  163.465073  4936400 0.003630178 0.003630178 -0.001894039    -0.005576956    Up  3
4   03-12-2012  190.759995  191.300003  188.360001  189.479996  162.957703  3349600 0.001213907 0.001213907 -0.002480478    -0.001636046    Up  4
1   28-11-2012  190.979996  192.039993  189.270004  191.979996  165.107727  3603600 0.004010855 0.004010855 -0.001198021    -0.006354834    Up  1
2   29-11-2012  192.75  192.899994  190.199997  191.529999  164.720734  4077900 0.00114865  0.00114865  -0.004020279    -0.009502386    Down  2
3   30-11-2012  191.75  192 189.5   190.070007  163.465073  4936400 0.003630178 0.003630178 -0.001894039    -0.005576956    Up  3
4   03-12-2012  190.759995  191.300003  188.360001  189.479996  162.957703  3349600 0.001213907 0.001213907 -0.002480478    -0.001636046    Down  4
") %>%
  mutate(bin = ifelse(Direction == "Up", 1, 0))

glm(bin ~ High, family = "binomial", data = df)
Mateusz1981
  • 1,817
  • 17
  • 33