1

I have the following dataframe:

df = structure(list(Day = c(19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 
21L, 21L), Month = c(9, 9, 9, 9, 9, 9, 9, 9, 9, 9), Year = c(2004, 
2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004), Date = c("2004-09-19", 
"2004-09-20", "2004-09-20", "2004-09-20", "2004-09-20", "2004-09-21", 
"2004-09-21", "2004-09-21", "2004-09-21", "2004-09-21"), Outlet = c("Le Monde", 
"Financial Times", "Corriere della Sera", "Frankfurter Allgemeine Zeitung", 
"El Pais", "Financial Times", "La Tribune", "Financial Times", 
"Borsen-Zeitung", "Borsen-Zeitung"), Country = c("France", "International", 
"Italy", "Germany", "Spain", "Germany", "France", "Germany", 
"Germany", "Germany")), row.names = c("text1", "text2", "text3", 
"text4", "text5", "text6", "text7", "text8", "text9", "text10"
), class = "data.frame")

       Day Month Year       Date                         Outlet       Country
text1   19     9 2004 2004-09-19                       Le Monde        France
text2   20     9 2004 2004-09-20                Financial Times International
text3   20     9 2004 2004-09-20            Corriere della Sera         Italy
text4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany
text5   20     9 2004 2004-09-20                        El Pais         Spain
text6   21     9 2004 2004-09-21                Financial Times       Germany
text7   21     9 2004 2004-09-21                     La Tribune        France
text8   21     9 2004 2004-09-21                Financial Times       Germany
text9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany
text10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany


I want to have an idex for every row that has the same Date as below:


       Day Month Year       Date                         Outlet       Country   ID
text1   19     9 2004 2004-09-19                       Le Monde        France    1
text2   20     9 2004 2004-09-20                Financial Times International    2
text3   20     9 2004 2004-09-20            Corriere della Sera         Italy    2
text4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany    2
text5   20     9 2004 2004-09-20                        El Pais         Spain    2
text6   21     9 2004 2004-09-21                Financial Times       Germany    3
text7   21     9 2004 2004-09-21                     La Tribune        France    3
text8   21     9 2004 2004-09-21                Financial Times       Germany    3
text9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany    3
text10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany    3

To do it, I do:

library(dplyr)

df %>% group_by(Date) %>% mutate(id = row_number())

      Day Month  Year Date       Outlet                         Country          id
       
 1    19     9  2004 2004-09-19 Le Monde                       France            1
 2    20     9  2004 2004-09-20 Financial Times                International     1
 3    20     9  2004 2004-09-20 Corriere della Sera            Italy             2
 4    20     9  2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany           3
...

Yet, it doesn't work. I don't manage to understand why.

Can anyone help me solve this?

Thanks a lot!

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
Rollo99
  • 1,601
  • 7
  • 15

3 Answers3

2

We can use cur_group_id after grouping by 'Date'

library(dplyr)
df %>%
    group_by(Date) %>% 
    mutate(ID = cur_group_id()) %>%
    ungroup

Or another option is match without grouping

df %>%
   mutate(ID = match(Date, unique(Date)))

Or using base R

df$ID <- with(df, match(Date, unique(Date))) 
akrun
  • 874,273
  • 37
  • 540
  • 662
1

using rleid

df %>% 
  mutate(id = data.table::rleid(Date))

   Day Month Year       Date                         Outlet       Country id
1   19     9 2004 2004-09-19                       Le Monde        France  1
2   20     9 2004 2004-09-20                Financial Times International  2
3   20     9 2004 2004-09-20            Corriere della Sera         Italy  2
4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany  2
5   20     9 2004 2004-09-20                        El Pais         Spain  2
6   21     9 2004 2004-09-21                Financial Times       Germany  3
7   21     9 2004 2004-09-21                     La Tribune        France  3
8   21     9 2004 2004-09-21                Financial Times       Germany  3
9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3
10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3
Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14
1

I think no need to apply group_by. simply do this, using dense_rank

mutate(df, ID = dense_rank(Date))

> mutate(df, ID = dense_rank(Date))
   Day Month Year       Date                         Outlet       Country ID
1   19     9 2004 2004-09-19                       Le Monde        France  1
2   20     9 2004 2004-09-20                Financial Times International  2
3   20     9 2004 2004-09-20            Corriere della Sera         Italy  2
4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany  2
5   20     9 2004 2004-09-20                        El Pais         Spain  2
6   21     9 2004 2004-09-21                Financial Times       Germany  3
7   21     9 2004 2004-09-21                     La Tribune        France  3
8   21     9 2004 2004-09-21                Financial Times       Germany  3
9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3
10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3

row_number creates the index of all rows within the group whereas dense_rank assigns each group a unique index

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45