I am using the lightbend/config library to parse and store the HOCON data. The library uses HashMap to capture the hierarchy of data and has memory overhead (than json, partly because of the extra information for debugging and substitutions).
I am looking for an alternative way to represent the data in memory efficiently based on the structural information we have in the JSON Schema
the information is stored in immutable way, so major focus is to efficiently store and fetch.
(I need to load the configs in memory, there may be a huge number of configs)
(It would be better if the new structure can reduce memory without losing support for the dot notation access and supports substitutions if possible)
following are the ideas i currently have,
(I still use lightbend/config for parsing, i focus only on the storing part for now. Also didn't test support for substitutions.)
The data used here to compare ideas does not have any substitutions and i am considering JSON object's size as base value
- The basic idea was to store them as POJO objects (POJO generated from JSON Schema). There may be thousands of configs
(for checking purpose i just used jsonschema2pojo to generate classes) HOCON to JSON JSONSchema to POJO
the main advantages i see here are:
- this method preserves dot notation access
- also substitutions can be achieved through object reference (if not primitive types).
- type safe
- The memory is also less than hashmaps and is comparable to the below object array method.
the disadvantages:
- This method requires creation of so many classes and many instances of each class.
- high instantiation time (one time operation)
- using object arrays (array of references) and dropping keys. eg:
json-schema (in HOCON format):
{
app {
type: object
additionalProperties: false
properties {
name {type: string}
version {type: string}
author {
type: object
additionalProperties: { //author name can be anything - retain as hash map
additionalProperties: false
properties: {
name {type: string}
email {type: string}
}
}
}
server {
type: object
additionalProperties: false
properties {
port {type: int}
host {type: string}
}
}
}
}
}
HOCON data for above schema:
app {
name = "My App"
version = "1.0"
author {
alice {
name = "John Doe"
email = "john.doe@example.com"
}
bob {
name = "John Doe"
email = "john.doe@example.com"
}
}
server {
port = 8080
host = "localhost"
}
}
A class is created to associate index to each key and based on that the object array holds data (the hashmaps are converted as object arrays).
final class Root {
String getPath() {
return "";
}
App app = new App();
final class App {
final int index = 0;
Name name = new Name();
Version version = new Version();
Author author = new Author();
Server server = new Server();
String getPath(){
return Root.this.getPath() + index;
}
final class Name {
final int index = 0;
String getPath(){
return app.getPath() + "." + index;
}
}
final class Version {
final int index = 1;
String getPath(){
return app.getPath() + "." + index;
}
}
final class Author { //handles hash map
final int index = 2;
String key; //need to get the unknown key at runtime.
Name name = new Name();
Email email = new Email();
String getPath(){
return app.getPath() + "." + index + "";
}
String getPathWithKey(){
return app.getPath() + "." + index + "." + key;
}
Author setKey(String key) {
this.key = key;
return this;
}
final class Name{
final int index = 0;
String getPath(){
return author.getPathWithKey() + "." + index;
}
}
final class Email{
final int index = 1;
String getPath(){
return author.getPathWithKey() + "." + index;
}
}
}
final class Server {
final int index = 3;
Port port = new Port();
Host host = new Host();
String getPath(){
return app.getPath() + "." + index;
}
final class Port {
final int index = 0;
String getPath(){
return server.getPath() + "." + index;
}
}
final class Host {
final int index = 1;
String getPath(){
return server.getPath() + "." + index;
}
}
}
}
}
the object array based on the above class is
root[
0: app[
0: name,
1: version,
2: author {
alice: [ 0:name,
1: email ],
bob: [ 0: name,
1: email ]
},
3: server [ 0: port,
1: host ]
]
]
for which the class provides the path when given:
Root root = new Root();
root.app.author.authorMap.setKey("alice").name.getPath()
returns: 0.2.alice.0
On comparison to above method:
- memory is not much reduced compared to POJO.
- loss of type safety. (doesn't feel like good design)
- generic references of type Object
POJO and object arrays seem like allocating fixed size for reference, where POJO will have references of specific type and object[] has general object reference.
is there any advantage to preferring object arrays over POJOs.
this question discusses the class vs array memory problem
the solution to above problem was to store every pair of doubles sequentially in single dimension array. Based on this a third approach:
- int[] stores the integers contiguously in heap whereas Integer[] stores contiguous references to integer objects spread out in heap.
If so based on this, i can treat each schema like a primitive type and fix it's size. so for the above example schema:
the leaf values are:
app.name string - max 50 chars - 100 bytes
app.version string - max 20 chars - 40 bytes
app.author Map - reference - 8 bytes
app.server.port int - 32 bytes
app.server.host unknown string - ref - 8 bytes
author hash map item array:
<author>.name string - max 50 chars - 100 bytes
<author>.email string - max 50 chars - 100 bytes
<--------------app---------------->
<--server-->
[name, version, author, port, host] (ByteBuffer)
|
V
[name, email] (ByteBuffer)
and create ByteBuffers to store the data as bytes and access based on offset. This method will require one class per schema to store the offset.
is there any other efficient ways?