-1

O.K this question is very basic, but i can't get it so need your help. I understand the idea of splitting age to categories. For example : good graph (:

I don't understand how the model knows that the 30< category is before the 31-45 category, why the 31-45 category is before the 46-60 category and etc. how the model knows not to make this graph - bad graph ):

Thanks!

Community
  • 1
  • 1
Amit S
  • 225
  • 6
  • 16
  • 1
    you need to have age classes as a factor - that lets you specify the order – Richard Telford Dec 18 '19 at 11:59
  • when plotting you give the model the sequence of age categories, model does not need them in other areas – Nikos M. Dec 18 '19 at 11:59
  • The age intervals are an ordinal variable. In R, these can be represented by an `ordered` factor. Typically, these are included in a model using dummy encoding with polynomial contrasts. – Roland Dec 18 '19 at 12:03

1 Answers1

2

Consider this example:

age = 1:100

fctr <- as.factor(cut(age, breaks = c(0,25,50,75,100)))

print(levels(fctr))

[1] "(0,25]"   "(25,50]"  "(50,75]"  "(75,100]"

There you can see, how the levels are ordered. This is the order that plot and ggplot2 will use. You can change this order in the following way:

fctr2 <- factor(fctr,levels(fctr)[c(2,1,3,4)])

print(levels(fctr2))

[1] "(25,50]"  "(0,25]"   "(50,75]"  "(75,100]"

If you are working more often with factors consider using the forcats package.

BerriJ
  • 913
  • 7
  • 18