2

I'm grouping a data frame by the column "month", and then summarising the "users" column.

Using this code:

Count_Users_By_Month <- Users_By_Month %>% group_by(month) %>% 
  summarise(Users = length(unique(users)))

I get this, that i'm 100% sure it's correct:

     month       Users
1 Diciembre      4916
2 Noviembre      3527

Question 1: How to add a column showing the variation in "Diciembre" based on "Noviembre"?(In percentage %).

Need to create a colum for the variation month to month

The formula (pseudocode) is this one:

(DiciembreUsers-NoviembreUsers)/NoviembreUsers

** Of course the value for Noviembre would be clear cause there is no data from previous month (October).

I tried this code to do this, but get an error:

Count_Users_By_Month <- Users_By_Month %>% group_by(month) %>% 
  summarise(Users = length(unique(users))) %>%
  mutate(Variacion = (Count_Users_By_Month[1,2]-Count_Users_By_Month[2,2])/Count_Users_By_Month[2,2])

Error: not compatible with STRSXP

**Last edit:

Problem solved, Thanks @Khasha. See comments:

Changed "lag" for "lead".... it worked. Just added "lead" to the divison part to get the formula right.

mutate(variation=(Users-lead(Users))/lead(Users))
Omar Gonzales
  • 3,806
  • 10
  • 56
  • 120
  • 1
    `mutate(variation=(Users-lag(Users))/Users)` – Khashaa Jan 08 '15 at 05:25
  • @akrun The dataset is the first one, it just shows the users for "Diciembre" and "Noviembre". I need to show also the Variation from Diciembre to Noviembre. – Omar Gonzales Jan 08 '15 at 05:42
  • If your data.frame is reversed (row-wise), does it work correctly? (Referring to a just-deleted comment where you said @Khashaa's results show in Nov and should show in Dec.) – r2evans Jan 08 '15 at 05:42
  • 2
    `arrange` your data chronically, or use `lead` instead of `lag` – Khashaa Jan 08 '15 at 05:44
  • @r2evans No, i changed it row-wise in Excel and the result should be: -0.28. But Khashaa is -0.39 – Omar Gonzales Jan 08 '15 at 05:46
  • @Khashaa works with lead, instead of lag. I'll investigate the difference between those two. Problem solve. Thanks! – Omar Gonzales Jan 08 '15 at 05:47

1 Answers1

1

This is the original data frame:

    month       Users
1 Diciembre      4916
2 Noviembre      3527

This is the answer:

Count_Users_By_Month <- Users_By_Month %>% group_by(month) %>% 
                        summarise(Users = length(unique(users))) %>%
                        mutate(variation=(Users-lead(Users))/lead(Users))

Need to investigate how "lead" works. All the credits to @Khashaa, see his answer in comments. Just modified the formula, added "lead" in the division part to get the right answer

Omar Gonzales
  • 3,806
  • 10
  • 56
  • 120