5

Consider the following example

> library(forcats)
> library(dplyr)
> 
> 
> dataframe <- data_frame(var = c(1,1,1,2,3,4),
+                         var2 = c(10,9,8,7,6,5))
> dataframe
# A tibble: 6 x 2
    var  var2
  <dbl> <dbl>
1  1.00 10.0 
2  1.00  9.00
3  1.00  8.00
4  2.00  7.00
5  3.00  6.00
6  4.00  5.00

I create a factor variable

> dataframe <- dataframe %>% mutate(myfactor = factor(var))
> 
> dataframe$myfactor
[1] 1 1 1 2 3 4
Levels: 1 2 3 4

I do not understand what is the correct syntax (and the logic) to reorder this factor according to some other computation done at the factor level.

For instance, I would like to reorder my factors according to decreasing values of

> data_rank <- dataframe %>% group_by(myfactor) %>% summarise(rank_var = mean(var2))

> data_rank
# A tibble: 4 x 2
  myfactor rank_var
  <fct>       <dbl>
1 1            9.00
2 2            7.00
3 3            6.00
4 4            5.00

So 4 would be first, 3 would be second, etc.

What is the syntax to do so with fct_reorder, and what is the logic behind it?

Thanks!

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235

2 Answers2

6

Suppose your dataframe is:

dataframe <- data_frame(var = c(1,1,1,2,3,4),var2 = c(10,2,0,15,6,5))
dataframe <- dataframe %>% mutate(myfactor = factor(var))
dataframe$myfactor

[1] 1 1 1 2 3 4
Levels: 1 2 3 4

Now if you want to reorder your factor, where the order is given by the output of a certain function fun on a certain vector x then you can use fct_reorder in the following way:

dataframe$myfactor= fct_reorder(f = dataframe$myfactor,x = dataframe$var2,fun = mean)
dataframe$myfactor
[1] 1 1 1 2 3 4
Levels: 1 4 3 2

mean of dataframe$var2 for each factor will be calculated and sorted in ascending order by default to order the factor.

tushaR
  • 3,083
  • 1
  • 20
  • 33
  • something i dont get is whether x must be in the same dataframe as the factor – ℕʘʘḆḽḘ Feb 15 '18 at 11:30
  • No. `x` can be any vector (of same length that of `f` ) which will be grouped by the `f` to apply the `fun`. The whole point of `fct_reorder` is that you don't have to do explicit computation of `rank_var`. – tushaR Feb 15 '18 at 12:59
  • thanks and sorry if I ask obvious questions. there is still something I dont get. you are saying that under the hood the `fun` will be applied within rows of the same factor level? (essentially doing a `group_by(myfactor)` computation? – ℕʘʘḆḽḘ Feb 15 '18 at 13:17
  • also I guess that each element of `x`, say `x_i` is associated with the element of f at the same exact row? – ℕʘʘḆḽḘ Feb 15 '18 at 13:20
  • @ℕʘʘḆḽḘ Yeah. See you have four levels of the factor. So to order them you need four ranks which are being calculated by the function fun(x). `x` will be a vector of values of variable specified as `x` (`dataframe$var2` in this case) corresponding to each factor. The very reason I used a different set of values is so that you can see the effect of different functions like `sum`, `max`, `min` on the order of factor. – tushaR Feb 15 '18 at 14:26
1

To understand fct_reoder, I created a similar but modified data frame.

> dataframe <- data_frame(var = as.factor(c(1,2,3,2,3,1,4,1,2,3,4)),var2 = c(1,5,4,2,6,2,9,8,7,6,3))

> str(dataframe)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   11 obs. of  2 variables:
 $ var : Factor w/ 4 levels "1","2","3","4": 1 2 3 2 3 1 4 1 2 3 ...
 $ var2: num  1 5 4 2 6 2 9 8 7 6 ...

here we could see that there are 2 columns, having column 1(var) as a factor variable with levels c(1,2,3,4).

Now, if one wants to reorder the factors on the basis of the sum of their respective values(var2), one can use the fct_reorder function as below.

In order to get the difference b/w with and without fct_reorder.

At first, we would sum up the var2 on the basis of their factors(var) without using fct_reorder:

> dataframe %>% group_by(var) %>% summarise(var2=sum(var2))
# A tibble: 4 x 2
  var    var2
  <fct> <dbl>
1 1        11
2 2        14
3 3        16
4 4        12

Here we could see that the result is not ordered on the basis of the sum of var2.

Now, we would use fct_order to show the difference.

> dataframe %>% mutate(var=fct_reorder(var,var2,sum)) %>%
+ group_by(var) %>% summarise(var2=sum(var2))
# A tibble: 4 x 2
  var    var2
  <fct> <dbl>
1 1        11
2 4        12
3 2        14
4 3        16

This shows that summation is now ordered.

Likewise, fct_reorder can be used to plot the graphs(boxplot or histogram etc.) in an ordered way

  • In RStudio help for fct_reorder2(), I copied this example (please copy to WORD to read): df <- tibble::tribble( ~color, ~a, ~b, "blue", 1, 2, "green", 6, 2, "purple", 3, 3, "red", 2, 3, "yellow", 5, 1 ) df$color <- factor(df$color) # Levels: blue green purple red yellow fct_reorder(df$color, df$a, min) # Levels: blue red purple yellow green fct_reorder2(df$color, df$a, df$b) # Levels: purple red blue green yellow The last one is hard to understand. Can you help me understand? – Steve Dec 10 '20 at 04:21