repeat row data with categorical variable a varying number of times

Question

I have a dataframe that looks like this:

DATE    SURVEY  SPECIES COUNT
07/12/16    13  Paka    1
07/12/16    14  Paka    1
07/12/16    15  Grouper 2
07/12/16    16  Grouper 3
07/12/16    18  Grouper 1
07/12/16    20  Lehi    3

Where 'Count' is the number of individuals observed for each given survey. I'd like to reshape the data so that there is a single row (record) for each individual: e.g.:

DATE    SURVEY  SPECIES
07/12/16    13  Paka
07/12/16    14  Paka
07/12/16    15  Grouper
07/12/16    15  Grouper
07/12/16    16  Grouper
07/12/16    16  Grouper
07/12/16    16  Grouper
07/12/16    18  Grouper
07/12/16    20  Lehi
07/12/16    20  Lehi
07/12/16    20  Lehi

With the data for each survey replicated as many times as needed. I considered using the melt() or summarise() functions but am stuck because the COUNT variable ranges anywhere from 1 - 10.

Thanks

Zheyuan Li · Accepted Answer · 2016-08-03T00:20:03.180

Suppose you original data frame is dat, You can do the following to get your desired data frame new_dat:

new_dat <- dat[rep.int(1:nrow(dat), dat$COUNT), -4]
rownames(new_dat) <- 1:nrow(new_dat)

The key is the row indexing vector rep.int(1:nrow(dat), dat$COUNT), while the -4 only signals dropping the 4th column, i.e., column COUNT.

Test

x <- "DATE    SURVEY  SPECIES COUNT
      07/12/16    13  Paka    1
      07/12/16    14  Paka    1
      07/12/16    15  Grouper 2
      07/12/16    16  Grouper 3
      07/12/16    18  Grouper 1
      07/12/16    20  Lehi    3"

dat <- read.table(text = x, header = TRUE)

#DATE   SURVEY  SPECIES COUNT
#07/12/16   13  Paka    1
#07/12/16   14  Paka    1
#07/12/16   15  Grouper 2
#07/12/16   16  Grouper 3
#07/12/16   18  Grouper 1
#07/12/16   20  Lehi    3

new_dat <- dat[rep.int(1:nrow(dat), dat$COUNT), -4]
rownames(new_dat) <- 1:nrow(new_dat)

#        DATE SURVEY SPECIES
#1   07/12/16     13    Paka
#2   07/12/16     14    Paka
#3   07/12/16     15 Grouper
#4   07/12/16     15 Grouper
#5   07/12/16     16 Grouper
#6   07/12/16     16 Grouper
#7   07/12/16     16 Grouper
#8   07/12/16     18 Grouper
#9   07/12/16     20    Lehi
#10  07/12/16     20    Lehi
#11  07/12/16     20    Lehi

Worked like a charm! Thanks :) – makai Aug 03 '16 at 00:15 — makai, Aug 03 '16 at 00:15

score 1 · Answer 2 · answered Aug 03 '16 at 03:01

1

Here is another option with expandRows

library(splitstackshape)
`row.names<-`(expandRows(df1, "COUNT"), NULL)
#      DATE SURVEY SPECIES
#1  07/12/16     13    Paka
#2  07/12/16     14    Paka
#3  07/12/16     15 Grouper
#4  07/12/16     15 Grouper
#5  07/12/16     16 Grouper
#6  07/12/16     16 Grouper
#7  07/12/16     16 Grouper
#8  07/12/16     18 Grouper
#9  07/12/16     20    Lehi
#10 07/12/16     20    Lehi
#11 07/12/16     20    Lehi

answered Aug 03 '16 at 03:01

akrun

874,273
37
540
662

@ZheyuanLi The assignment is to make it one-line making, but basically the `expandRows` is just a wrapper which does the same thing as in your code. However, people are posting wrappers as a unique solution and getting more points. Example in point is [here](http://stackoverflow.com/questions/38711057/extracting-a-number-of-a-string-of-varying-lengths/38711116#38711116) – akrun Aug 03 '16 at 03:11

repeat row data with categorical variable a varying number of times

2 Answers2