0

Ciao guys,

i have the following dataframe.

obj <- data.frame (occ= c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
Date = c("1990-01", "1990-01", "1990-01", "1990-01", "1990-02", "1990-02", "1990-02", "1990-02", "1990-03", "1990-03", "1990-03", "1990-03", "1990-04", "1990-04", "1990-04", "1990-04"),
                   emp_value = c(33, 0, 55, 44, 0, 50, 70, 80, 91, 32, 32, 22, 11, 31, 42, 51)
)

I would like to do the following:

I would like generate a variable which takes the difference in emp_value for every unique occupation (occ) between different dates.

My desired dataframe would be

obj <- data.frame (occ= c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
               Date = c("1990-01", "1990-01", "1990-01", "1990-01", "1990-02", "1990-02", "1990-02", "1990-02", "1990-03", "1990-03", "1990-03", "1990-03", "1990-04", "1990-04", "1990-04", "1990-04"),
               emp_value = c(33, 0, 55, 44, 0, 50, 70, 80, 91, 32, 32, 22, 11, 31, 42, 51), 
               emp_diff = c(0, 0, 0, 0, -33, 50, 15, 36, 91, -18, -38, -48, -69, -70, -1, 10)

)

Note that my real data frame consists of thousands of values and hundreds of different occupations. In addition, not every occupation appears within each date.

Many thanks in advance!

freddywit
  • 301
  • 1
  • 5

1 Answers1

1

You could use dplyr:

library(dplyr)
obj %>%
  group_by(occ) %>%
  mutate(emp_diff = emp_value - lag(emp_value, default = 0))
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • Thanks for your message! That was almost what I needed, I just had to replace lag(emp_diff, default = 0) by lag(emp_value , default = 0). Thanks man! – freddywit Aug 07 '21 at 10:29
  • Ah...my mistake.corrected it. – Martin Gal Aug 07 '21 at 10:30
  • If a occurence is missing, this one takes the differences between two consecutive dates. If one occurence is missing it subtractes for example `1900-01-01` and `1900-01-03`. – Martin Gal Aug 07 '21 at 10:34