1

I have a dataframe of transect data, such that for each transect, there are species codes and for each species, there is a count associated. I'm trying to calculate a proportion of transects that ID a particular species by separating the data frame into each transect. How can I take a vector of repeating chunks of numbers, separate it by chunks of same values, and get the indices?

Example:

x <- c(1, 2, 1, 2, 3, 1)
y <- c(3, 2, 3, 3, 2, 3)
Transects <- rep(x, y)

I want it to output chunks like these

c(1, 1, 1)
c(2, 2)
c(1, 1, 1)
c(2, 2, 2)
c(3, 3)
c(1, 1, 1)

or more importantly, the associated indices, which would give me

c(1, 2, 3)
c(4, 5)
c(6, 7, 8)
c(9, 10, 11)
c(12, 13)
c(14, 15, 16) 

I don't even know what functions to try, because I don't know what indices to separate the vector at, nor can I separate by simple value because there are chunks of the same values and I don't want those mixed together since they're different transects. Any help is appreciated, I wouldn't even know how to go about building a function that could do this.

markus
  • 25,843
  • 5
  • 39
  • 58
Vic
  • 11
  • 2
  • 1
    `rleid` from `data.table` will be useful: `split(Transects, data.table::rleid(Transects))`. Might be helpful: https://stackoverflow.com/questions/33507868/is-there-a-dplyr-equivalent-to-data-tablerleid – markus Jul 02 '20 at 18:08

2 Answers2

1

You can do:

split(Transects, with(rle(Transects), rep(seq_along(values), lengths)))

$`1`
[1] 1 1 1

$`2`
[1] 2 2

$`3`
[1] 1 1 1

$`4`
[1] 2 2 2

$`5`
[1] 3 3

$`6`
[1] 1 1 1

Or if interested in indices:

split(seq_along(Transects), with(rle(Transects), rep(seq_along(values), lengths)))

$`1`
[1] 1 2 3

$`2`
[1] 4 5

$`3`
[1] 6 7 8

$`4`
[1]  9 10 11

$`5`
[1] 12 13

$`6`
[1] 14 15 16

Alternatively, you can do:

split(Transects, cumsum(c(0, diff(Transects)) != 0))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • 1
    @Andrew I'm not sure whether we should consider the x and y vectors. I think they were used just to create the Transects vector. – tmfmnk Jul 02 '20 at 19:13
  • Thank you so much, the second one that returns indices is exactly what I was looking for! – Vic Jul 02 '20 at 20:18
  • I tried to upvote your response, but I don't have enough reputation for it to change the public score yet, sorry. – Vic Jul 02 '20 at 20:31
  • Don't worry about that, I'm glad it works for you :) – tmfmnk Jul 02 '20 at 20:36
0

You can use map2 function from purrr package:

purrr::map2(x, y, rep)

det
  • 5,013
  • 1
  • 8
  • 16
  • That breaks up the chunks the way I want, but doesn't get me the indices for them so that I can manipulate the counts of individual transects. – Vic Jul 06 '20 at 00:57