1

I have a dataset as below,

// e.g.
[{
  id: 'M1',
  description: 'Lorem description',
  fields: [{
    name: 'field_1',
    value: 'Lorem value 1'
  }]
}]

which I need to transform into,

[
  {
    id: 'M1',
    description: 'Lorem description',
    field_1: 'Lorem value 1'
  }
]

I wrote the code below to accomplish this. And it works well, but I don't think this is the best way to do it. How can I make my solution better performing? Because this is slower when the dataset gets larger.

const _sampleData = [{
    id: 'M1',
    description: 'Lorem description',
    fields: [{
      name: 'field_1',
      value: 'Lorem value 1'
    }]
  },
  {
    id: 'M2',
    description: 'Lorem description',
    fields: [{
        name: 'field_1',
        value: 'Lorem value 1'
      },
      {
        name: 'field_2',
        value: 'Lorem value 2'
      }
    ]
  }
];

function toObject(fields) {
  const out = {};
  for (const field of fields) {
    out[field.name] = field.value;
  }
  return out;
}

function getFlatSampleData() {
  const data = [];

  for (const item of _sampleData) {
    let out = {};
    for (const key in item) {
      if (Array.isArray(item[key])) {
        out = {
          ...out,
          ...toObject(item[key])
        };
      } else {
        out[key] = item[key];
      }
    }
    data.push(out);
  }

  return data;
}

console.log(getFlatSampleData());
Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
0xdw
  • 3,755
  • 2
  • 25
  • 40
  • 1
    Ask on [Code Review](https://codereview.stackexchange.com/) – skara9 Jul 10 '22 at 04:05
  • How does the run time increase as you use larger input? For an input 10 times the size do you see more than a 10x increase in runtime? – Paul Rooney Jul 10 '22 at 04:39
  • @PaulRooney I really didn't have a benchmark on this. But I feel like O(n^2). I can definitely feel the slowness when the page loads. – 0xdw Jul 10 '22 at 04:52
  • Maybe it's linear time O(n) because the dataset is around 1.5m with closer to 100 fields in each subarray (`fields`). – 0xdw Jul 10 '22 at 04:54
  • 1
    My initial thought is that you are taking an overly generic approach. Its 3 loops and you can probably remove some of these by naming the fields explicitly. You know the fields you want, id and description from the outer object and name and value from the fields. – Paul Rooney Jul 10 '22 at 04:55
  • 1
    It could also be choking on storing a huge array, in which case maybe a generator function might help. It does look however that it is linear and hence the excess run time is just as a result of the size of the data. I came up with [this](https://ideone.com/K6cBuU) which I can post as an answer if it ends up helping. It doesn't deal with the variability of things possibly being an array or not. – Paul Rooney Jul 10 '22 at 05:00
  • @PaulRooney Yes, that's correct. – 0xdw Jul 10 '22 at 05:04

2 Answers2

1

The part that seems to be the culprit is as follows:

        out = {
          ...out,
          ...toObject(item[key])
        };

Instead, since you want to flatten your object, you could do something like this:

let obj = [{
  id: 'M1',
  description: 'Lorem description',
  fields: [{
    name: 'field_1',
    value: 'Lorem value 1'
  }]
}];
let output = {};
let itemQueue = [obj];
let limit = 0;
while (limit < itemQueue.length) {
    for (let key in itemQueue[limit]) {
        if ((Array.isArray(itemQueue[limit][key])) || (typeof itemQueue[limit][key] === "object")) {
            if ((typeof itemQueue[limit][key].name !== "undefined") && (typeof itemQueue[limit][key].value !== "undefined")) {
                output[itemQueue[limit][key].name] = itemQueue[limit][key].value;
            } else {
                itemQueue.push(itemQueue[limit][key])
            }
        } else {
            output[key] = itemQueue[limit][key];
        }
    }
    limit++;
}
console.log(output);

The idea is to not generate new objects always, but rather use a combination of a stack and a loop.

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
1

The required processing time is expected to increase as datasets get larger. The code only accesses each element once and the transformation could not happen in less accesses. So from a high level computational perspective it's complexity is ok. Apart from some code improvements (also mentioned in comments and other answers) I could propose the following two things.

  1. Preallocate the memory needed for the objects. Instead of calling push every time.

    function getFlatSampleData() {
     const data = new Array(_sampleData.length); 
         for (i=0;i>_sampleData.length;i++) { 
             let item=_sampleData[i];
             let out = {};
                 for (const key in item) {
                     if (Array.isArray(item[key])) { 
                         out = { 
                             ...out, 
                             ...toObject(item[key])
                             }; 
                     } else {
                         out[key] = item[key];
                     }  
                 }
             data[i]=out; 
         }
     return data; 
    }
    
  2. Use workers or a parallel JavaScript Framework to parallelize the process. This process can be run in parallel. For instance , assuming there is available computing power multiple Workers could work in different parts of the result array and input data at the same time.

Spyros K
  • 2,480
  • 1
  • 20
  • 37