2

I have an complex object that holds million of int

int[] ints = new int[1000000]

If I save that values directly via ByteBuffer it's file size is about 5MB

When I save that values to protocol buffer object, It save each value not as int but as Integer. Then when I save that byte[] stream to file It's file size is over than 8MB

It seems protocol buffer does not provide primitive array type.

Is there any way(or trick) to reduce the byte[] size of protocol buffer object that contains million of ints?

Jihun No
  • 1,201
  • 1
  • 14
  • 29

1 Answers1

1

When I save that values to protocol buffer object

How exactly are you doing that? Normally, with protobuf, you define some type in a .proto schema; the obvious contender here would be:

syntax = "proto3";
message Whatever {
    repeated int32 ints = 1;
}

In proto3 "packed" is considered the default when enabled, so this should use "packed" encoding, giving a size that is... well, slightly dependent on the data, since it uses "varint" encoding, but for 1000000 elements it could be anywhere between 1,000004 bytes and 10,000,004 (between 1 and 10 bytes per element, 1 byte for the field header, and 3 bytes for the length - 10 bytes per element usually means: negative numbers encoded as int32).

If you know the values are often negative, or often large, you could choose to use sint32 (uses zig-zag encoding; avoids the 10-bytes for negative numbers) or sfixed32 (always uses 4 bytes per element) instead of int32, but the "packed" should still apply.

In proto2, you need to opt-in for "packed":

syntax = "proto2";
message Whatever {
    repeated int32 ints = 1 [packed=true];
}
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • First .proto example is exact. The data in int[] is actually the result of PFor compression of hundreads of millions int. So each value of it seems quite large both positive and negative. I will try with sint32, sfixed32. I will be back. – Jihun No Dec 05 '19 at 17:58
  • @Jihun you could also try explicitly adding the "packed" hint – Marc Gravell Dec 05 '19 at 18:23
  • For my case, `repeated sfixed32` produced smallest file. almost same size to the IntBuffer wrote file size. – Jihun No Dec 11 '19 at 01:25