5

I would like to append elements to en empty Numpy array in-place. I know the maximum array size beforehand. I can't find a direct way to accomplish that, so here is my workaround:

N = 1000
a = np.empty([N], dtype=np.int32)
j = 0

for i in range(N):
  if f(i):
    a[j] = g(i)
    j += 1

a.resize(j)

Is there a more elegant way to code it, without keeping track of current length in j, similar in simplicity to C++ version below?

const int N = 1000;

vector<int> a;
a.reserve(N);

for (int i=0; i<N; i++) 
  if (f(i)) 
    a.push_back(g(i));

a.shrink_to_fit();

And yes, I read How to extend an array in-place in Numpy?, but it doesn't cover this specific case, i.e. array size limit known beforehand.

Paul Jurczak
  • 7,008
  • 3
  • 47
  • 72
  • 3
    Why build an array and resize? You could directly use list comprehension `a = np.array([g(i) for i in range(N) if f(i)])` – Ch3steR Jan 17 '22 at 09:05
  • 1
    Python isn't C++ with a flair, it's going to be painful when you do low level manipulations, if it's even possible at all. – Passer By Jan 17 '22 at 09:17
  • @Ch3steR Good point. I have two reasons: 1) `g()` and `f()` are really a bunch of statements too long to put inside a list comprehension, but not reasonable to make a function of; 2) this method creates a temporary list, which is not very time efficient for a large N, I think. – Paul Jurczak Jan 17 '22 at 09:20
  • Aside: `shrink_to_fit` either does nothing, or it copies all your elements. You may as well omit it and the `reserve` and let `vector` size itself with amortised constant copies – Caleth Jan 17 '22 at 09:20
  • "a bunch of statements too long to put inside a list comprehension, but not reasonable to make a function" anything can be a function. You can `def` it in the line above – Caleth Jan 17 '22 at 09:21
  • @Caleth `shrink_to_fit` can release unused memory, if compiler is smart enough. I want in-place operation, as mentioned in my post, hence `reserve`. – Paul Jurczak Jan 17 '22 at 09:24
  • @PaulJurczak That's what I mean. It releasing the unused memory is accomplished by moving (i.e. copying for `int`) everything to a new allocation. – Caleth Jan 17 '22 at 09:25
  • 2
    The `resize` method is the only operation that changes the size of a numpy array in-place. – hpaulj Jan 17 '22 at 09:34
  • 1
    I would need more information about the use case before giving any advice. How big will N really be? You can try two versions: one to resize in the end as you do now by performing a copy and one version in which you loop one beforehand to determine the actual size. Measure the performance and choose respectively. – codie Jan 17 '22 at 09:36
  • @Caleth True with a standard allocator, but you can chunk the memory with a custom allocator and return unused chunks at the tail end. – Paul Jurczak Jan 17 '22 at 09:36
  • @PaulJurczak nothing you do in a custom allocator will change the fact that `shrink_to_fit` *must* move (or copy if move could throw) the elements to a new allocation, if it does anything at all. – Caleth Jan 17 '22 at 09:39
  • @codie An extra loop could be justified in some cases, but I don't like the extra code. I posted my question to make sure I'm not missing something obvious. – Paul Jurczak Jan 17 '22 at 09:39
  • @PaulJurczak to me it is not about whether I like something or not. It is about solving a technical problem appropriately. And if the use-case gets solved by adding another loop, then so be it. But you know your use-case best. – codie Jan 17 '22 at 09:58
  • @PaulJurczak "shrink_to_fit can release unused memory, if compiler is smart enough." This as nothing to do with the compiler. This is done by the STL. Compilers can support multiple implementations. Moreover, in all implementation I am aware of, shrinking memory does either nothing or allocates a new smaller data structure. This means that 2 buffers are allocated simultaneously temporary. Not to mention `std::vector` can have a capacity 2 times bigger than its size, so the footprint can be 3 times bigger than required. `shrink_to_fit` should be used carefully for huge vectors. – Jérôme Richard Jan 17 '22 at 15:30
  • @JérômeRichard True: *compiler* -> *library writer*. Even if memory chunking I described above is not used or prohibited by C++ language definition, we have maximum 2 allocations with `shrink_to_fit` vs. potentially N. – Paul Jurczak Jan 17 '22 at 19:49

1 Answers1

1

Using np.fromiter(iter, dtype, count) with generator:

a = np.fromiter((g(i) for i in range(N) if f(i)), np.int32)

If the parameter count is omitted, the array size will expand automatically, but some performance will still be lost.

However, you cannot improve performance by specifying the parameter count as the maximum length, because if the iterator is not long enough, it will raise ValueError: iterator too short.

Mechanic Pig
  • 6,756
  • 3
  • 10
  • 31