1

I'm trying to split a vector (that is changing eveytime) into chunks, however same values have to belong to the same chunk. The number of chunks can vary but can be at least 4, or that the chunks are of equal frequencies.

For example, here is a vector:

j = c(1 ,11, 1, 2, 1, 1, 1 ,2, 4, 6 ,3)

the chunks using chunk(x=sort(j),n=4) will give

$`1`
[1] 1 1 1

$`2`
[1] 1 1 2

$`3`
[1] 2 3

$`4`
[1]  4  6 11

what i want is to have

$`1`
[1] 1 1 1 1 1

$`2`
[1] 2 2 3

$`3`
[1] 4 6

$`4`
[1] 11
  • 3
    Where did the `4` come from (in your desired output, 3rd group). Also what is the rule? Why 223, and then 46...Why not 22 and then 346? – Sotos Jun 30 '23 at 12:48
  • 2
    Hi Jinane! What precisely are you asking? Are you asking what the rules for what the `chunk` function should be to produce that result? Are you asking how you can write it? Asking if we could write it? Or something else? – Mark Jun 30 '23 at 12:51
  • 2
    What if your vector only has 3 different values? Should there be an 'empty' chunk? Where did the `4` come from? – Paul Stafford Allen Jun 30 '23 at 12:51
  • 2
    You could use `table(j)` and assemble the chunks from there using `rep()`? – Paul Stafford Allen Jun 30 '23 at 12:54
  • I already corrected the typo, i copied the results from a diffrent vector, sorry for the confusion. The vector will a least have 4 distinct values so there is no problem. I am doing a statistical test that needs to divide items into a least 4 groups. – Jinane Jouni Jun 30 '23 at 13:34
  • How do you define "similar groups" in your question title? Why not `split(j,j)`? – ThomasIsCoding Jun 30 '23 at 13:46

1 Answers1

1

You could use hclust() to do a cluster analysis. This does not produce the exact result you shared in your question, but the result aligns with your description.

j = c(1 , 11, 1, 2, 1, 1, 1 , 2, 4, 6 , 3)

hc <- hclust(dist(j))

memb <- cutree(hc, 4)

split(j, memb)
#> $`1`
#> [1] 1 1 2 1 1 1 2
#> 
#> $`2`
#> [1] 11
#> 
#> $`3`
#> [1] 4 3
#> 
#> $`4`
#> [1] 6
Till
  • 3,845
  • 1
  • 11
  • 18