3

I am trying to divide up (not necessarily into even chunks, bc the real data may vary) a single column of integers called scores (.csv file) and then count the consecutive values (of x chosen value, e.g. 1) in each divided portion or the mean length of a consecutive value. All possible with rle.

I can easily split the column of integers using split however this is seemingly incompatible with rle (presumably bc split generates a list). I looked for solutions and/or alternatives to rle but didn't come up with anything.

Example Scores

scores <- c(1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1)

Split them

g <- seq_along(scores)

scores.div <- split(scores, ceiling(g/7))

Example of what I tried, but didn't work

Scores.rle <- sapply(scores.div, function(x) {
  r <- rle(x)
  sum(r$values == 1)
})

I'd expect some output like this:

2 2 0 1 1

Any help is greatly appreciated

3 Answers3

0

We could also use tapply

as.vector(tapply(scores, ceiling(g/7), FUN = function(x) sum(rle(x)$values == 1)))
#[1] 2 2 0 1 1
akrun
  • 874,273
  • 37
  • 540
  • 662
0

I run your code and your code work well.

> scores <- c(1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1)
> g <- seq_along(scores)
> scores.div <- split(scores, ceiling(g/7))
> Scores.rle <- sapply(scores.div, function(x) {
+   r <- rle(x)
+   sum(r$values == 1)
+ })
> Scores.rle
1 2 3 4 5 
2 2 0 1 1

and my session is:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1   
Navy Cheng
  • 573
  • 4
  • 14
0

As explained in docs, sapply returns a named vector, according to the names for list created by lapply:

sapply is a user-friendly version and wrapper of lapply by default returning a vector, matrix or, if simplify = "array", an array if appropriate, by applying simplify2array(). sapply(x, f, simplify = FALSE, USE.NAMES = FALSE) is the same as lapply(x, f)

simply unname it (see its docs) and you're done:

> scores <- c(1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1)
> g <- seq_along(scores)
> scores.div <- split(scores, ceiling(g/7))
> unname(sapply(scores.div, function(x) sum(rle(x)$values ==1)))
[1] 2 2 0 1 1
Spätzle
  • 709
  • 10
  • 20