-1

I have a data that looks like this (with more rows and more columns, but summarising here):

class section
a NA
b s1
c NA
d NA

a NA
b s2
c NA
d NA

a NA
b s3
c NA
d NA

Class a always comes before b, and c/d always comes after b. The data works as groups, so each abcd forms a separate group.

What I want to do is to assign acd to sections that their b belongs to, so the end file should look like this:

class section
a s1
b s1
c s1
d s1

a s2
b s2
c s2
d s2

a s3
b s3
c s3
d s3

I tried to do with for loop and if else, but i have so many rows and it takes too long, I also wanted to learn if there is an efficient way to do this with dplyr.

I appreciate any help, thank you!

  • Downvote for failing to produce an unambiguous version of date what "looks like" something that was prompting Stefan to make an effort at coding. Your failure to use `dput` or other code to create a [[MCVE] led to a comment chain that was unproductive to future users. – IRTFM Jul 27 '23 at 01:07

1 Answers1

1

One option would be to first add an identifier column for the group and afterwards fill the section value with the value of the b class per group.

dat <- data.frame(
  class = rep(letters[1:4], 3),
  section = rep(paste0("s", 1:3), each = 4)
)
dat$section[dat$class != "b"] <- NA

dat
#>    class section
#> 1      a    <NA>
#> 2      b      s1
#> 3      c    <NA>
#> 4      d    <NA>
#> 5      a    <NA>
#> 6      b      s2
#> 7      c    <NA>
#> 8      d    <NA>
#> 9      a    <NA>
#> 10     b      s3
#> 11     c    <NA>
#> 12     d    <NA>

library(dplyr)

dat |>
  # Add identifier for the group. Grouping by `class` the first a, b, c, d gets assigned group = 1, second group = 2, ...
  mutate(group = row_number(), .by = class) |>
  # For each group replace the section value with the one for class  `b`
  mutate(section = section[class %in% "b"], .by = group) |>
  # Remove the group column
  select(-group)
#>    class section
#> 1      a      s1
#> 2      b      s1
#> 3      c      s1
#> 4      d      s1
#> 5      a      s2
#> 6      b      s2
#> 7      c      s2
#> 8      d      s2
#> 9      a      s3
#> 10     b      s3
#> 11     c      s3
#> 12     d      s3

EDIT For older versions of dplyr, i.e. < 1.1.0 you have to use group_by:

dat |>
  group_by(class) |> 
  mutate(group = row_number()) |>
  group_by(group) |> 
  mutate(section = section[class %in% "b"]) |>
  select(-group)
stefan
  • 90,330
  • 6
  • 25
  • 51
  • Sorry I could not reproduce this example, I got an error saying: Error in `mutate()`: ! Problem while computing `section = section[class %in% "b"]`. ✖ `section` must be size 12 or 1, not 3. Run `rlang::last_error()` to see where the error occurred. – cookiemonster Jul 26 '23 at 20:53
  • Could you check your `dplyr` version? The `.by` shorthand was introduced with `dplyr >= 1.1.0`. In older versions this will result in the the error your mention. If that is the issue you could update or use `group_by`. See my edit. – stefan Jul 26 '23 at 20:58
  • It worked on the example data, so now I am trying to apply it to my real data. Can you explain what the functions do on each line? Thank you! – cookiemonster Jul 26 '23 at 21:16
  • Just added some explanatory notes to the code. Hope that makes it clearer. – stefan Jul 26 '23 at 21:33
  • Thank you! The only problem seems to be at mutate(section = section[class %in% "b"]) part, I get an error saying: Error in `mutate()`: ! Problem while computing `section = section[class %in% "b"]`. ✖ `section` must be size 3 or 1, not 0. ℹ The error occurred in group 100: group = 100. Run `rlang::last_error()` to see where the error occurred. – cookiemonster Jul 26 '23 at 21:37
  • Hm. Issue is most likely that there is no feature with value `"mRNA"` in the last group. Will have a look. Probably we need some kind of `ifelse` to account for that. – stefan Jul 26 '23 at 21:40
  • You could try with `mutate(section = ifelse(any(class %in% "b"), section[class %in% "b"], section))`. The if else will first check whether there is a `b` class and if not will keep the section value as is, i.e. keeps the NAs. – stefan Jul 26 '23 at 21:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/254682/discussion-between-cookiemonster-and-stefan). – cookiemonster Jul 26 '23 at 21:55