2

Let's assume I want to store (integer) x/y-values, what is considered more efficient: Storing it in a primitive value like long (which fits perfect, due sizeof(long) = 2*sizeof(int)) using bit-operations like shift, or and a mask, or creating a Point-Class?

Keep in mind that I want to create and store many(!) of these points (in a loop). Would be there a perfomance issue when using classes? The only reason I would prefer storing in primtives over storing in class is the garbage-collector. I guess generating new objects in a loop would trigger the gc way too much, is it correct?

NHSH
  • 69
  • 7
  • 2
    As you create new instances in your loop, are they short lived, e.g. do they become deferenced? If you still have references to them, e.g. you're putting them in a List, they're not eligible for Garbage Collection anyway. When you say many, are you talking multiple millions? I wouldn't spend time optimizing a problem you don't know you have yet. – Kevin Hooke May 21 '21 at 16:28
  • 4
    There is the famous saying that "premature optimization is the root of all evil". This is often applied to writing harder to understand and maintain code in the hopes of some possible performance optimization without even having any data yet that the performance of that code even needed to be optimized. Your first proposed solution sounds like just that. – OH GOD SPIDERS May 21 '21 at 16:28
  • 3
    My advice would be to start of by creating and using a custom Point class. Once that solution is working you can actually measure its performance and see for yourself if it is a bottleneck that needs optimizing. If you however chose to write more obtuse code you might end up in a situation where you needlessly made your code more complex and harder to maintain as well as increasing the chance of future bugs, when you never even knew if that "optimization" was necessary to begin with – OH GOD SPIDERS May 21 '21 at 16:28
  • 1
    Why do you consider storing the x/y values in a `long`, “using bit-operations like shift, or and a mask”, instead of just two `int` variables? – Holger May 25 '21 at 08:12
  • @Holger Because that way I would have had to use two arraylist for each component. Using long instead of two seperate integers allows me to add two components into a single list. But it seems java - for some reason - has no problem with creating new objects inside a loop. I couldnt measure any perfomance losses as far as I remember. _ Best regards – NHSH May 25 '21 at 17:20
  • 1
    There’s no sense in using a primitive value, just to store it in an `ArrayList` that can only hold boxed values, in other words, objects. A `Long` object isn’t better than a `Point` object. If you want to process *bulk data*, i.e. a large number of values that need to exist in memory at once, just use an `int[]` of length `2*n` to store *n* points. For a lot of purposes, just using `Point` objects is indeed no problem. For those cases where the number of objects really creates a problem, remember encapsulation, e. g. use a class with a `Point` based API but internally using a primitive array. – Holger May 25 '21 at 17:44
  • Hm, so generics in java makes no difference in performance? ArrayList is the same as ArrayList? Okay, that makes sense, but in C# List is actually something different than List, I assumed it in java to be too. _ – NHSH May 25 '21 at 20:26
  • 1
    In Java, there is no `List`. A `List` is not the same as a `List`, but you can pass it to any code expecting a `List extends Object>` or any generic code dealing with a `List` when its `T` has no prohibiting bounds. There are [plans to add primitive type support](https://openjdk.java.net/jeps/218), but these are long term plans. You may read the “Open questions” there to see what is slowing down the process. – Holger May 26 '21 at 08:09

1 Answers1

4

Of course packing those as long[] is going to take less memory (though it is going to be contiguous). For each Object (a Point) you will pay at least 12 bytes more for the two headers.

On other hand, if you are creating them in a loop and thus escape analysis can prove they don't escape, it can apply an optimization called "scalar replacement" (thought is it very fragile); where your Objects will not be allocated at all. Instead those objects will be "desugared" to fields.

The general rule is that you should code the way it is the most easy to maintain and read that code. If and only if you see performance issues (via a profiler let's say or too many pauses), only then you should look at GC logs and potentially optimize code.

As an addendum, jdk code itself is full of such long where each bit means different things - so they do pack them. But then, me and I doubt you, are jdk developers. There such things matter, for us - I have serious doubts.

Eugene
  • 117,005
  • 15
  • 201
  • 306