4

This is a simplified array of what I have:

a = np.array([ 1, 12, 60, 80, 90, 210])
b = np.array([11, 30, 79, 89, 99, 232])

How can I get a result that uses a as the start range, and b as the end of the range, that can compute a list of numbers (quickly).

so, c would look like:

c = np.array([1,2,3,...,11, 12,13,14,...,29,30, 
              60,61,62,...79, ..., 210,211,...,231,232])

Ideally, this would be done in a vectorised way (using numpy/pandas) rather than python.

A H
  • 2,164
  • 1
  • 21
  • 36
  • You should be able to use `zip()` here. Are `a` and `b` always the same size? – pault Jan 16 '18 at 13:50
  • 2
    If you import `add` from `operator`, you can do the following: `c = np.array(reduce(add, [range(x, y) for x, y in zip(a, b)]))` – pault Jan 16 '18 at 13:57
  • 3
    You can try doing it this way: `c= np.array(np.concatenate([np.arange(a[i],b[i]+1) for i in range(len(a))]))`. – Vasilis G. Jan 16 '18 at 13:58
  • I don't know about the speed difference in `np.concatenate()` vs using `reduce()` and `add()`, but I like @VasilisG.'s solution because it doesn't require any additional imports. – pault Jan 16 '18 at 14:00
  • 3
    You can also use a combinateion of Vasilis' and pault's answer, `c = np.concatenate([np.arange(x,y+1) for x,y in zip(a,b)])` – Thomas Kühn Jan 16 '18 at 14:02
  • @ThomasKühn thank you, I was about to say that. Using `zip` is better. – Vasilis G. Jan 16 '18 at 14:03
  • Excellent answers, much more readable than the 'duplicated' answer provided by Divakar, although his is quicker. Thanks guys.(Still checking if Divakar's answer gives me the correct result - got a large dataset to check through) – A H Jan 16 '18 at 14:08
  • 2
    @AH Would love to hear about the timings on your dataset. – Divakar Jan 16 '18 at 14:18
  • 1
    My array of ~50,000 items (in each a & b) Using your answer Divakar: 1.92 ms ± 48.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Using the other answer: 97.6 ms ± 3.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) So 2 orders of magnitude, something to be expected of doing it vectorised. Thank you all for your answers :). – A H Jan 16 '18 at 15:25
  • 1
    Yeah, NumPy is good with those vectorized ones :) – Divakar Jan 16 '18 at 15:40
  • 1
    It's often the case that for small exanples, list operations are faster. The array version may have larger overhead, and thus only has the advantage when the problem becomes large. – hpaulj Jan 16 '18 at 17:55

1 Answers1

2

Summarizing the comments above: One way is to use zip() and np.concatenate().

c = np.concatenate([np.arange(x, y+1) for x, y in zip(a,b)])

HT to @VasilisG. And @ThomasKühn

pault
  • 41,343
  • 15
  • 107
  • 149