0

I have a data like below, and trying to write a soccer prediction with R.

Div                Date             HomeTeam           AwayTeam              FTHG            FTAG           FTR                 HTHG      
 Length:2184        Length:2184        Length:2184        Length:2184        Min.   :0.000   Min.   :0.000   Length:2184        Min.   :0.000  
 Class :character   Class :character   Class :character   Class :character   1st Qu.:1.000   1st Qu.:0.000   Class :character   1st Qu.:0.000  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   Median :1.000   Median :1.000   Mode  :character   Median :0.000  
                                                                             Mean   :1.539   Mean   :1.192                      Mean   :0.668  
                                                                             3rd Qu.:2.000   3rd Qu.:2.000                      3rd Qu.:1.000  
                                                                             Max.   :7.000   Max.   :7.000                      Max.   :5.000

FTR contains H(ome), A(way), D(raw). So it is character in data.frame.

While I am using this code below:

glm.fits = glm(FTR ~ .,data=alldata,family=binomial)

I am getting this error:

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

How can I solve it?

edit: I changed the FTR to vector, still not working

Solution: I decided to change my FTR from A,H,D to A(away), NA(not away). I guess this will solve the problem.

sanwhere
  • 23
  • 5
  • You have a multinomial problem, so binomial models should not be considered. It's not clear why you are mixing the results and the locations in one variable. Did you mean to have the out come as win loss or draw? – IRTFM Apr 17 '21 at 16:21
  • @IRTFM I didn't mean anything, I just want to make it work, I need a prediction with FTR column. How can I do that? – sanwhere Apr 17 '21 at 16:41
  • Add a minimal reproducible example -- use `dput()` to collect a piece of the data you're using and add it to your post. You'll get the help! – Kat Apr 17 '21 at 20:16
  • chances are, your values for Date are unique, since it is a character, and in this case it's meaningless to fit the regression. Can you exclude Date – StupidWolf Apr 18 '21 at 07:56
  • @StupidWolf yea I can, I will try! Do you think this is the reason? – sanwhere Apr 19 '21 at 15:22

1 Answers1

0

The glm function with family=binomial fits logistic regression which only works on response variables with exactly 2 levels.

If you want predictions for a response with 3 (or more) levels then you need some form of categorical/multinomial model. If there is a logical ordering to the 3 responses then proportional odds logistic regression can be used, the polr function in the MASS package or the lrm function in the rms package implement these (as well as other functions in other packages).

For unorderd categories it is more complicated, but you can search for other options (just be sure you understand what is being modeled), the multinom function in the nnet package is one.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • I just tried to use polr and lrm functions, they gave me the error as well. Even they are factors, I am getting errors. should I change the data to numeric? Like A: 1, H:2, D:0 – sanwhere Apr 17 '21 at 17:34
  • @sanwhere, What is the result of calling the `str` function on your dataset? This can tell us how R sees the data (we may consider it a factor, but R sees it differently). – Greg Snow Apr 19 '21 at 17:03