1

During the run of my program i create a lot of String(1.000.000) up to size of 700 and my program eats up a lot of memory.These Strings can contain only R,D,L,U as chars so i thought that i could represent them differently.I thought about using BitSet but i am not sure it is more memory efficient.Any ideas?

P.S:i could also shrink the String compressing equal chars(RRRRRRDDDD->R6D4) but i was hoping for a better solution.

Epitheoritis 32
  • 366
  • 5
  • 13
  • 1
    Bits only have two values, 0 or 1. You will have to store four different values so bits won't work. Two bits would be necessary to represent each value. –  Aug 14 '19 at 09:46
  • 2
    You have to represent 4 different values, si using two bits (00, 01, 10, 11) for each caracter could be an alternative. – cocool97 Aug 14 '19 at 09:49
  • 1
    Maybe [EnumSet](https://www.baeldung.com/java-enumset) is suitable? – Abra Aug 14 '19 at 09:50
  • 1
    If you could just change your letters to A, C, G and T then this is a full duplicate of your question with useful answers: https://stackoverflow.com/questions/40417632/dna-compression-using-bitset-java/40418156#40418156 (and still useful if you can't because the mechanisms are the same) – Erwin Bolwidt Aug 14 '19 at 09:59
  • Unfortunately, EnumSet as per Abra's suggestion is not suitable here because you would only be able to store at most one of each letter in it. – NorthernSky Aug 14 '19 at 10:01
  • Looks like an XY problem (https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). As the community can provide a varied range of solutions to specific problems, can you explain what your program is trying to accomplish that needs the creation of so many large strings? – Filippo Possenti Aug 14 '19 at 10:41

1 Answers1

1

as a first step, you could try to switch to char[]. Java String takes approx 40 bytes more than the sum of its characters (source) and char[] is considerably more convenient than bit arithmetic

even more economical is byte[] since one char requires two bytes allocation, while a byte is, of course, one byte (and still has room for 256 distinct values)

Sharon Ben Asher
  • 13,849
  • 5
  • 33
  • 47