-1

I have a dataframe (table), that includes frequency counts (Freq) of 2 levels (F, I) of a categorical variable (Fert).

table[1:10]

    FemID Sperm  Week Fert Freq
1:   269  High    1    F    4
2:   269  High    1    I    5
3:   273  High    1    F    6
4:   274  High    1    I    1
5:   275  High    1    I    1
6:   276  High    1    I    1
7:   278   Low    1    I    1
8:   280   Low    1    I    1
9:   281   Low    1    I    1
10:   282   Low    1    I    5

I would like to convert this to a dataframe in which the two levels of Fert (I and F) are separate variables for each value of FemID, with 0 for missing counts of a level, like so:

    FemID Sperm  Week Fert Infert
1:   269  High    1    4    5
2:   273  High    1    6    0
3:   274  High    1    1    0
4:   275  High    1    1    0
5:   276  High    1    1    0

Thoughts or suggestions? I feel like a loop is required, but I'm not sure how to go about setting it up for this. Perhaps there are two parts, one that creates the two new variables and one that fills in the 0's?

2 Answers2

0

You can use spread in tidyr for this:

> library(tidyr)
> df %>% spread(Fert,Freq)
  FemID Sperm Week  F  I
1   269  High    1  4  5
2   273  High    1  6 NA
3   274  High    1 NA  1
4   275  High    1 NA  1
5   276  High    1 NA  1
6   278   Low    1 NA  1
7   280   Low    1 NA  1
8   281   Low    1 NA  1
9   282   Low    1 NA  5

You can also adjust the variable names:

> df %>% spread(Fert,Freq) %>% 
      setNames(c("FemID","Sperm","Week","Fert","Infert"))
  FemID Sperm Week Fert Infert
1   269  High    1    4      5
2   273  High    1    6     NA
3   274  High    1   NA      1
4   275  High    1   NA      1
.... the rest is truncated

And can filter by NAs:

> df %>% spread(Fert,Freq) %>% 
    setNames(c("FemID","Sperm","Week","Fert","Infert")) %>% 
    filter(!is.na(Fert))
  FemID Sperm Week Fert Infert
1   269  High    1    4      5
2   273  High    1    6     NA
mysteRious
  • 4,102
  • 2
  • 16
  • 36
  • On my PC `spread` comes with `tidyr`. May be you have got a special version of `dplyr`:-) – MKR Mar 21 '18 at 19:21
0

Since your data is in data.table hence dcast is an good option:

  library(data.table)
  setDT(df)
  dcast(df, FemID+Sperm+Week~Fert, value.var = "Freq") 

  #OR A shorter way could be as

  dcast(df, ...~Fert, value.var = "Freq")


  #    FemID  Sperm  Week  F  I
  # 1:   269  High    1  4  5
  # 2:   273  High    1  6 NA
  # 3:   274  High    1 NA  1
  # 4:   275  High    1 NA  1
  # 5:   276  High    1 NA  1
  # 6:   278   Low    1 NA  1
  # 7:   280   Low    1 NA  1
  # 8:   281   Low    1 NA  1
  # 9:   282   Low    1 NA  5

Data

  df <- read.table(text = "FemID Sperm  Week Fert Freq
  1:   269  High    1    F    4
  2:   269  High    1    I    5
  3:   273  High    1    F    6
  4:   274  High    1    I    1
  5:   275  High    1    I    1
  6:   276  High    1    I    1
  7:   278   Low    1    I    1
  8:   280   Low    1    I    1
  9:   281   Low    1    I    1
  10:   282   Low    1    I    5", header = TRUE, stringsAsFactors = FALSE)
MKR
  • 19,739
  • 4
  • 23
  • 33