Repeating elements in a vector with a for loop

Question

I want to make a vector from 3:50 in R, looking like

3 4 4 5 6 6 7 8 8 .. 50 50

I want to use a for loop in a for loop but it's not doing wat I want.

f <- c()
for (i in 3:50) {
  for(j in 1:2) {
    f = c(f, i)
  }
}

What is wrong with it?

Growing vectors like that in a loop is a bad idea. You are solving a linear problem by an unintentionally quadratic algorithm. Use modular arithmetic to directly construct the vector. — John Coleman, Feb 17 '18 at 10:23
As you insist on using a nested for loop as you posted below my answer, your error in your code currently is that you use `for(j in 1:2)`, regardless of whether `i` is odd (then `j` should only be 1) or even then `j` should loop through 1 and 2. So inside the outer for loop, you need to set the maximum value for `j`, lets call it `a`. Then, the inner loop needs to look like `for( j in 1:a)`. You check if `i` is odd by using the modulo operator (see Wikipedia "modulo"): `if( i %% 2 ) ... `. — akraf, Feb 17 '18 at 10:57
I recommend you try to put the pieces together and post them as an answer yourself, if you insist I can post the correct answer, but then you're not gonna be a better programmer afterwards ;) — akraf, Feb 17 '18 at 10:57
... damn, typo: You check if i is odd by using `if( i %% 2 == 1) ...` — akraf, Feb 17 '18 at 11:06

Jaap · Answer 1 · 2018-02-17T11:49:14.843

16

Another option is to use an embedded rep:

rep(3:50, rep(1:2, 24))

which gives:

 [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16 17 18 18 19 20 20
[28] 21 22 22 23 24 24 25 26 26 27 28 28 29 30 30 31 32 32 33 34 34 35 36 36 37 38 38
[55] 39 40 40 41 42 42 43 44 44 45 46 46 47 48 48 49 50 50

This utilizes the fact that the times-argument of rep can also be an integer vector which is equal to the length of the x-argument.

You can generalize this to:

s <- 3
e <- 50
v <- 1:2

rep(s:e, rep(v, (e-s+1)/2))

Even another option using a mix of rep and rep_len:

v <- 3:50
rep(v, rep_len(1:2, length(v)))

edited Feb 17 '18 at 11:49

answered Feb 17 '18 at 11:33

Jaap

81,064
34
182
193

2

This is both conceptually simple (though you have to think about it a bit to see how it works) and extremely fast (an average of just 1 microsecond on my machine). – John Coleman Feb 17 '18 at 11:36

www · Answer 2 · 2018-04-04T14:57:28.953

9

A solution based on sapply.

as.vector(sapply(0:23 * 2 + 2, function(x)  x + c(1, 2, 2)))

# [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16 17 18 18 19 20 20 21 22 22 23 24 24 25 26 26
# [37] 27 28 28 29 30 30 31 32 32 33 34 34 35 36 36 37 38 38 39 40 40 41 42 42 43 44 44 45 46 46 47 48 48 49 50 50

Benchmarking

Here is a comparison of performance for all the current answers. The result shows that cumsum(rep(c(1, 1, 0), 24)) + 2L (m8) is the fastest, while rep(3:50, rep(1:2, 24))(m1) is almost as fast as the m8.

library(microbenchmark)
library(ggplot2)

perf <- microbenchmark(
  m1 = {rep(3:50, rep(1:2, 24))},
  m2 = {rep(3:50, each = 2)[c(TRUE, FALSE, TRUE, TRUE)]},
  m3 = {v <- 3:50; sort(c(v,v[v %% 2 == 0]))},
  m4 = {as.vector(t(cbind(seq(3,49,2),seq(4,50,2),seq(4,50,2))))},
  m5 = {as.vector(sapply(0:23 * 2 + 2, function(x)  x + c(1, 2, 2)))},
  m6 = {sort(c(3:50, seq(4, 50, 2)))},
  m7 = {rep(seq(3, 50, 2), each=3) + c(0, 1, 1)},
  m8 = {cumsum(rep(c(1, 1, 0), 24)) + 2L},
  times = 10000L
)

perf
# Unit: nanoseconds
# expr   min    lq      mean median    uq     max neval
#   m1   514  1028  1344.980   1029  1542  190200 10000
#   m2  1542  2570  3083.716   3084  3085  191229 10000
#   m3 26217 30329 35593.596  31871 34442 5843267 10000
#   m4 43180 48321 56988.386  50891 55518 6626173 10000
#   m5 30843 35984 42077.543  37526 40611 6557289 10000
#   m6 40611 44209 50092.131  46779 50891  446714 10000
#   m7 13879 16449 19314.547  17478 19020 6309001 10000
#   m8     0  1028  1256.715   1028  1542   71454 10000

edited Apr 04 '18 at 14:57

answered Feb 17 '18 at 12:30

www

38,575
12
48
84

@JohnColeman Thanks. The time difference is nanoseconds among different answers. Unless the OP really cares about the small time differences, I think all the answers would be a great choice. – www Feb 17 '18 at 12:49
1

Thanks for the benchmarking. It is nice to see the different answers and how they do. – kangaroo_cliff Feb 17 '18 at 12:57
1

@headpoint You're welcome. I am thinking how to explain this pattern. It looks like solutions with `seq` would be slower than solutions only using `rep`. In addition, adding `sort` may also increase some time. But again, the time difference is only nanoseconds. They are all good answers. – www Feb 17 '18 at 13:01
1

I was wondering about it, too. When I removed `sort` form `m6`, it's average time is ~ 15k. Still, no way near `m1` or `m2`. So, `rep` must be much faster than `seq`. – kangaroo_cliff Feb 17 '18 at 13:14
1

Wonderful work. Very informative. I'm not surprised with time taken by `sort` . If I remove the `sort` from my solution then it will reach near to top performers but `sort` cannot be taken away. – MKR Feb 17 '18 at 14:02
1

For fun I microbenchmarked your microbenchmark statement. On my machine it averaged just under 2 seconds for `microbenchmark` to evaluate each of the 8 methods 10000 times, gather the timing results into a dataframe with 80000 observations and compute the relevant summary statisitics. R can be surprisingly fast at times. – John Coleman Feb 17 '18 at 14:21

akraf · Accepted Answer · 2018-02-17T10:36:10.027

Use the rep function, along with the possibility to use recycling logical indexing ...[c(TRUE, FALSE, TRUE, TRUE)]

rep(3:50, each = 2)[c(TRUE, FALSE, TRUE, TRUE)]

 ## [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16 17 18 18 19
## [26] 20 20 21 22 22 23 24 24 25 26 26 27 28 28 29 30 30 31 32 32 33 34 34 35 36
## [51] 36 37 38 38 39 40 40 41 42 42 43 44 44 45 46 46 47 48 48 49 50 50

If you use a logical vector (TRUE/FALSE) as index (inside [ ]), a TRUE leads to selection of the corresponding element and a FALSE leads to omission. If the logical index vector (c(TRUE, FALSE, TRUE, TRUE)) is shorter than the indexed vector (rep(3:50, each = 2) in your case), the index vector is recyled.

Also a side note: Whenever you use R code like

 x = c(x, something)

or

 x = rbind(x, something)

or similar, you are adopting a C-like programming style in R. This makes your code unnessecarily complex and might lead to low performance and out-of-memory issues if you work with large (say, 200MB+) data sets. R is designed to spare you those low-level tinkering with data structures.

Read for more information about the gluttons and their punishment in the R Inferno, Circle 2: Growing Objects.

I don't want the vector 3, 3, 4, 4, 5, 5, ... but 3, 4, 4, 5, 6, 6, 7 .. So repeating an element 1 and then 2 times. — Max, Feb 17 '18 at 10:30
For a school assignment I need to make the vector by using a nested for loop. But is it to difficult to explain? — Max, Feb 17 '18 at 10:42
Out of curiosity, I benchmarked each of the 3 solutions. Your solution is by far the fastest. — John Coleman, Feb 17 '18 at 11:13

score 5 · Answer 4 · answered Feb 17 '18 at 10:37

5

The easiest way I can found is in way to create another one containing only even values (based on OP's intention) and then simply join two vectors. The example could be:

v <- 3:50
sort(c(v,v[v %% 2 == 0]))

# [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16
#      17 18 18 19 20 20 21 22 22 23 24 24 25 26 26 27 28 28
#[40] 29 30 30 31 32 32 33 34 34 35 36 36 37 38 38 39 40 40 41 42 42
#     43 44 44 45 46 46 47 48 48 49 50 50

answered Feb 17 '18 at 10:37

MKR

19,739
4
23
33

1

This is nice. At first I thought that the sort would make this slower, but `microbenchmark` shows that it is faster than my solution. – John Coleman Feb 17 '18 at 11:12

John Coleman · Answer 5 · 2018-02-17T11:20:56.377

Here is a loop-free 1 line solution:

> as.vector(t(cbind(seq(3,49,2),seq(4,50,2),seq(4,50,2))))
 [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16 17
[23] 18 18 19 20 20 21 22 22 23 24 24 25 26 26 27 28 28 29 30 30 31 32
[45] 32 33 34 34 35 36 36 37 38 38 39 40 40 41 42 42 43 44 44 45 46 46
[67] 47 48 48 49 50 50

It forms a matrix whose first column is the odd numbers in the range 3:50 and whose second and third columns are the even numbers in that range and then (by taking the transpose) reads it off row by row.

The problem with your nested loop approach is that the fundamental pattern is one of length 3, repeated 24 times (instead of a pattern of length 2 repeated 50 times). If you wanted to use a nested loop, the outer loop could iterate 24 times and the inner loop 3. The first pass through the outer loop could construct 3,4,4. The second pass could construct 5,6,6. Etc. Since there are 24*3 = 72 elements, you can pre-allocate the vector (by using f <- vector("numeric",74) ) so that you aren't growing it 1 element at a time. The idiom f <- c(f,i) that you are using at each stage copies all of the old elements just to create a new vector which is only 1 element longer. Here there are too few elements for it to really make a difference, but if you try to create large vectors that way the performance can be shockingly bad.

lmo · Answer 6 · 2018-02-17T13:07:47.583

4

Here is a method that combines portions of a couple of the other answers.

rep(seq(3, 50, 2), each=3) + c(0, 1, 1)
 [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16
[21] 16 17 18 18 19 20 20 21 22 22 23 24 24 25 26 26 27 28 28 29
[41] 30 30 31 32 32 33 34 34 35 36 36 37 38 38 39 40 40 41 42 42
[61] 43 44 44 45 46 46 47 48 48 49 50 50

Here is a second method using cumsum

cumsum(rep(c(1, 1, 0), 24)) + 2L

This should be very quick.

edited Feb 17 '18 at 13:07

answered Feb 17 '18 at 12:47

lmo

37,904
9
56
69

1

Just updated my benchmarking with your answer. Thanks for sharing it. – www Feb 17 '18 at 12:55
1

@www If you get a minute, consider adding my second method. It should be substantially faster. – lmo Feb 17 '18 at 13:11
1

I have added your second method to my benchmarking. You are right. It is amazingly fast. – www Feb 17 '18 at 13:20

score 3 · Answer 7 · answered Feb 17 '18 at 12:30

3

This should do too.

sort(c(3:50, seq(4, 50, 2)))

answered Feb 17 '18 at 12:30

kangaroo_cliff

6,067
3
29
42

score 0 · Answer 8 · answered Mar 05 '18 at 01:53

Another idea, though not competing in speed with fastest solutions:

mat <- matrix(3:50,nrow=2)
c(rbind(mat,mat[2,]))
# [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16 17 18 18 19 20 20 21 22 22
# [31] 23 24 24 25 26 26 27 28 28 29 30 30 31 32 32 33 34 34 35 36 36 37 38 38 39 40 40 41 42 42
# [61] 43 44 44 45 46 46 47 48 48 49 50 50

Repeating elements in a vector with a for loop

8 Answers8