Issues with "group_by" and "mutate"

Question

I have the following dataframe:

df = structure(list(Day = c(19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 
21L, 21L), Month = c(9, 9, 9, 9, 9, 9, 9, 9, 9, 9), Year = c(2004, 
2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004), Date = c("2004-09-19", 
"2004-09-20", "2004-09-20", "2004-09-20", "2004-09-20", "2004-09-21", 
"2004-09-21", "2004-09-21", "2004-09-21", "2004-09-21"), Outlet = c("Le Monde", 
"Financial Times", "Corriere della Sera", "Frankfurter Allgemeine Zeitung", 
"El Pais", "Financial Times", "La Tribune", "Financial Times", 
"Borsen-Zeitung", "Borsen-Zeitung"), Country = c("France", "International", 
"Italy", "Germany", "Spain", "Germany", "France", "Germany", 
"Germany", "Germany")), row.names = c("text1", "text2", "text3", 
"text4", "text5", "text6", "text7", "text8", "text9", "text10"
), class = "data.frame")

       Day Month Year       Date                         Outlet       Country
text1   19     9 2004 2004-09-19                       Le Monde        France
text2   20     9 2004 2004-09-20                Financial Times International
text3   20     9 2004 2004-09-20            Corriere della Sera         Italy
text4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany
text5   20     9 2004 2004-09-20                        El Pais         Spain
text6   21     9 2004 2004-09-21                Financial Times       Germany
text7   21     9 2004 2004-09-21                     La Tribune        France
text8   21     9 2004 2004-09-21                Financial Times       Germany
text9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany
text10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany

I want to have an idex for every row that has the same Date as below:


       Day Month Year       Date                         Outlet       Country   ID
text1   19     9 2004 2004-09-19                       Le Monde        France    1
text2   20     9 2004 2004-09-20                Financial Times International    2
text3   20     9 2004 2004-09-20            Corriere della Sera         Italy    2
text4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany    2
text5   20     9 2004 2004-09-20                        El Pais         Spain    2
text6   21     9 2004 2004-09-21                Financial Times       Germany    3
text7   21     9 2004 2004-09-21                     La Tribune        France    3
text8   21     9 2004 2004-09-21                Financial Times       Germany    3
text9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany    3
text10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany    3

To do it, I do:

library(dplyr)

df %>% group_by(Date) %>% mutate(id = row_number())

      Day Month  Year Date       Outlet                         Country          id
       
 1    19     9  2004 2004-09-19 Le Monde                       France            1
 2    20     9  2004 2004-09-20 Financial Times                International     1
 3    20     9  2004 2004-09-20 Corriere della Sera            Italy             2
 4    20     9  2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany           3
...

Yet, it doesn't work. I don't manage to understand why.

Can anyone help me solve this?

Thanks a lot!

score 2 · Accepted Answer · answered Jan 12 '21 at 13:59

We can use cur_group_id after grouping by 'Date'

library(dplyr)
df %>%
    group_by(Date) %>% 
    mutate(ID = cur_group_id()) %>%
    ungroup

Or another option is match without grouping

df %>%
   mutate(ID = match(Date, unique(Date)))

Or using base R

df$ID <- with(df, match(Date, unique(Date)))

score 1 · Answer 2 · answered Jan 12 '21 at 14:11

using rleid

df %>% 
  mutate(id = data.table::rleid(Date))

   Day Month Year       Date                         Outlet       Country id
1   19     9 2004 2004-09-19                       Le Monde        France  1
2   20     9 2004 2004-09-20                Financial Times International  2
3   20     9 2004 2004-09-20            Corriere della Sera         Italy  2
4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany  2
5   20     9 2004 2004-09-20                        El Pais         Spain  2
6   21     9 2004 2004-09-21                Financial Times       Germany  3
7   21     9 2004 2004-09-21                     La Tribune        France  3
8   21     9 2004 2004-09-21                Financial Times       Germany  3
9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3
10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3

AnilGoyal · Answer 3 · 2021-01-12T14:43:31.703

I think no need to apply group_by. simply do this, using dense_rank

mutate(df, ID = dense_rank(Date))

> mutate(df, ID = dense_rank(Date))
   Day Month Year       Date                         Outlet       Country ID
1   19     9 2004 2004-09-19                       Le Monde        France  1
2   20     9 2004 2004-09-20                Financial Times International  2
3   20     9 2004 2004-09-20            Corriere della Sera         Italy  2
4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany  2
5   20     9 2004 2004-09-20                        El Pais         Spain  2
6   21     9 2004 2004-09-21                Financial Times       Germany  3
7   21     9 2004 2004-09-21                     La Tribune        France  3
8   21     9 2004 2004-09-21                Financial Times       Germany  3
9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3
10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3

row_number creates the index of all rows within the group whereas dense_rank assigns each group a unique index

Issues with "group_by" and "mutate"

3 Answers3