0

I'm converting an from long to wide format, with NA values. And trying to add new rows that perform operations that "skip" the NA values.

I can use spread(), but ultimately want to perform operations on each measurement that is collected across multiple years. In the example below, this may contain two or more values. For example meas == 4 contains three values, where the other measurements contain two:

df <- data.frame(meas=c(1,1,1,2,2,2,3,3,3,4,4,4), 
             year=rep(c("y2016", "y2017","y2018")), 
             value=c(1,2,NA,NA,1,4,1,NA,3,1,5,10))

spread(df, year, value)
  meas y2016 y2017 y2018
1    1     1     2    NA
2    2    NA     1     4
3    3     1    NA     3
4    4     1     5    10

What I'm ultimately looking for is something between a long and a wide dataset, as shown here:

  meas year1 year2 value1 value2
1    1  2016  2017      1      2
2    2  2017  2018      1      4
3    3  2016  2018      1      3
4    4  2016  2017      1      5
5    4  2017  2018      5     10
massisenergy
  • 1,764
  • 3
  • 14
  • 25
Matt R
  • 1
  • 1
  • 2

1 Answers1

0

I don't think you need to pivot the data.frame, using lead thoughtfully should suffice:

df %>%
  mutate(year = stringr::str_extract(year, "[0-9]+")) %>%
  filter(!is.na(value)) %>%
  group_by(meas) %>%
  mutate(year2 = lead(year),
         value2 = lead(value, order_by = year)) %>%
  filter(!is.na(value2)) %>%
  select(meas, year, year2, value, value2)

# A tibble: 5 x 5
# Groups:   meas [4]
   meas year  year2 value value2
  <dbl> <chr> <chr> <dbl>  <dbl>
1     1 2016  2017      1      2
2     2 2017  2018      1      4
3     3 2016  2018      1      3
4     4 2016  2017      1      5
5     4 2017  2018      5     10
zack
  • 5,205
  • 1
  • 19
  • 25