0
sampleData = [{
  RouteId: "1",
  InDirection: "1",
  AreaCode: ["41108", "41109", "41110", "41111"],
}, {
  RouteId: "1",
  InDirection: "2",
  AreaCode: ["41108", "41109", "411011"],
}, {
  RouteId: "2",
  InDirection: "1",
  AreaCode: ["41112", "41114"],
}, {
  RouteId: "2",
  InDirection: "2",
  AreaCode: ["41112", "41114"],
}, {
  RouteId: "3",
  InDirection: "1",
  AreaCode: ["41112", "41114"],
}, {
  RouteId: "3",
  InDirection: "2",
  AreaCode: ["41112", "41114","41108", "41109", "41110"],
}, {
  RouteId: "4",
  InDirection: "1",
  AreaCode: ["41112", "41114","41108", "41110" , "41120", "41121"],
}, {
  RouteId: "4",
  InDirection: "2",
  AreaCode: ["41112", "41114"],
}]

I want to sort above sampleData based on number of entries in AreaCodes and get top 20 results. But I only want one object every RouteId. Every RouteId can have two types of InDirection = 1 or 2. So in the above result would like to removed

{
  RouteId: "1",
  InDirection: "2",
  AreaCode: ["41108", "41109", "411011"],
}

since it less entires on AreaCode as compared to InDirection= 1

so the final sorted result should be

finalResult = [{
  RouteId: "1",
  InDirection: "1",
  AreaCode: ["41108", "41109", "41110", "41111"],
}, {
  RouteId: "2",
  InDirection: "1",
  AreaCode: ["41112", "41114"],
}, {
  RouteId: "3",
  InDirection: "2",
  AreaCode: ["41112", "41114","41108", "41109", "41110"],
}, {
  RouteId: "4",
  InDirection: "1",
  AreaCode: ["41112", "41114","41108", "41110" , "41120", "41121"],
}]

Here I got so far:

    const filteredItems = sampleData.filter(item => {
const otherItem = transformedData.find(i => i.RouteId === item.RouteId && i.InDirection !== item.InDirection);
if (otherItem) {
    return item.AreaCode.length > otherItem.AreaCode.length;
} else {
    return true;
} 
});

But this missed condition where length of AreaCode is equal and the final result is not sorted.

Peter Seliger
  • 11,747
  • 3
  • 28
  • 37
Bonzo
  • 427
  • 2
  • 10
  • 28
  • I doubt that the OP's provided `sampleData` structure is actually what the OP really works with. It neither does meet the `apiResponse` data's structure of [another question of the OP](https://stackoverflow.com/questions/76652030/extract-data-from-object-and-create-new-object-using-filter-and-map) nor does it resemble a computed useful intermediate data structure of the just linked problem. Especially the second observation leaves me puzzled. – Peter Seliger Jul 10 '23 at 15:09
  • @PeterSeliger: Ok, but so what? In fact, it looks as though this input is the requested output from the other question, as though the OP was breaking the problem down into what seemed like logical steps. – Scott Sauyet Jul 10 '23 at 15:33
  • @ScottSauyet ...1/2... The new Q. should have asked for **the most straightforward solution** which takes as [input the originally provided `apiResponse` data](https://stackoverflow.com/questions/76652030/extract-data-from-object-and-create-new-object-using-filter-and-map) and has as output the data-structure of the already there changed requirements. Since the above new question changes or omits the underlying basic data-structure the OP will receive solutions of entirely different approaches that of cause target the above provided `sampleData` structure. The latter unfortunately is a... – Peter Seliger Jul 10 '23 at 15:53
  • @ScottSauyet ...2/2... result of the OP's before asked question of another thread where the OP himself was first clear about the final data but then changed the requirements by comments (the original question still remains unchanged regarding all the data). Thus the OP most probably is going to connect two solutions based on an intermediate result the OP did not even want himself at first place. All of that leads to a solution which makes the processing of the actually to be used data-structure more _"resource hungry"_ and much more complicated, thus less understandable and less maintainable. – Peter Seliger Jul 10 '23 at 15:55
  • @Bonzo ... check on the [last provided solution (_**Edit 2**_) to your original problem](https://stackoverflow.com/a/76653316/2627243) which operates the real underlying `apiResponse` data. It computes the result according to the requirements here (and the changed requirements there), based on lookups and mainly within a single `reduce` and a single `map` task. Thus, it implements a much more consistent and straightforward approach than anything you are going to assemble yourself from all the answers of both of your latest (most recent) questions. – Peter Seliger Jul 10 '23 at 16:16
  • @PeterSeliger: Sure, the combination of the questions smacks a bit of an [XY problem](https://meta.stackexchange.com/q/66377). And I didn't follow everything that happened with the previous question. But this one seem reasonably well-asked. And AFAICT, you don't know that the intermediate result is not also needed for something. (BTW, I coded a solution for this before leaving my desk for lunch, and then never had time to complete it before leaving work for the day. I'll try to post tomorrow, with a somewhat different approach.) – Scott Sauyet Jul 11 '23 at 02:04
  • @PeterSeliger: I see your deleted answer, and I hope you feel like completing it. I posted my own, but it's a very different technique, and I'd love to see your posted as well. – Scott Sauyet Jul 11 '23 at 15:04
  • Before providing a solution one has to get an understanding of what the OP's ... _"and get top 20 results"_ ... actually means. This information is crucial because the OP himself states ... _"Since in the real scenario `sampleData` is very large, is it possible to limit it to 20 [entries] in sorted result."_ ... Until now OP did not response to _Harrison's_ question which emphasizes the very unclear requirement. Any so far provided solution is in danger of having picked the wrong approach, depending on what exactly ... _"I want to sort above `sampleData` ... and get top 20 results. "_ means. – Peter Seliger Jul 11 '23 at 15:25
  • 1/3 ... _"Top 20"_ is not specific for it can mean the first 20 matching results of an ideal `sampleData` array (like with the OP's example) which does provide the items of same `RouteId` values always pairwise. With this scenario one just could limit the provided array's `length` value to `40` and proceed with e.g. a reducer task that just picks the `RouteId` item with the greater `AreaCode` array length. – Peter Seliger Jul 11 '23 at 15:47
  • 2/3 ... On the other hand, for a fully complete but totally randomly ordered set of data items, one might need to sort the entire `sampleData` array by each of the item's `AreaCode` array length value in descending order, followed by an interruptible iteration process which collects unique `RouteId` items until one has reached an amount of 20 of such unique items. – Peter Seliger Jul 11 '23 at 15:47
  • 3/3 ... There are also possibilities in between. A lot depends on the source data structure and also on how reliable/stable this data structure is going to be provided. Especially since the OP talks about vast data, any of the above information has an impact on the chosen approach. None of the OP's 2 most recent questions got provided with the bare minimum of crucially needed information right from start. The OP instead in both threads did shift focus and continued changing requirements whilst people already having come up with solutions based on then already outdated assumptions. – Peter Seliger Jul 11 '23 at 16:05

5 Answers5

1

I personally don't think filter() is the best solution here.

If you know that the provided structure always provides exactly two items with the same RouteId after each other. Then you can chunk the array in chunks of size 2. Then map() each chunk by comparing the two elements based on AreaCode.length.

const desired = chunkN(2, sampleData).map(([inDirection1, inDirection2]) => (
  inDirection1.AreaCode.length >= inDirection2.AreaCode.length
  ? inDirection1
  : inDirection2
));

The code above uses a chunkN() helper that I've defined in the snippet below. It essentially cuts up the array in chunks of size N. Then we use the conditional (ternary) operator to select if inDirection1 or inDirection2 is the item with the largest AreaCode.length.


If the provided elements are not strictly structured and can be in any order and there are possibly more then two inDirection options. I would suggest first grouping all the elements based on RouteId. Then sort() each group based on AreaCode.length in a descending manner and select the first item.

const groups = groupBy(item => item.RouteId, sampleData);
const desired = Array.from(groups.values(), (group) => (
  // ascending = a - b, descending = b - a
  group.sort((a, b) => b.AreaCode.length - a.AreaCode.length)[0]
));

The groupBy() helper is a helper that returns a Map instance. In the example above RouteId is used as the key. The value is an array of items that match this key.

function solutionA(sampleData) {
  const desired = chunkN(2, sampleData).map(([inDirection1, inDirection2]) => (
    inDirection1.AreaCode.length >= inDirection2.AreaCode.length
    ? inDirection1
    : inDirection2
  ));
  console.log("solutionA", desired);
}

function solutionB(sampleData) {
  const groups = groupBy(item => item.RouteId, sampleData);
  const desired = Array.from(groups.values(), (items) => (
    // ascending = a - b, descending = b - a
    items.sort((a, b) => b.AreaCode.length - a.AreaCode.length)[0]
  ));
  console.log("solutionB", desired);
}

// helpers
function chunkN(n, array) {
  const chunks = [];
  for (let index = 0; index < array.length; index += n) {
    chunks.push(array.slice(index, index + n));
  }
  return chunks;
}

function groupBy(fnKey, iterable) {
  const groups = new Map();
  for (const item of iterable) {
    const key = fnKey(item);
    if (!groups.has(key)) groups.set(key, []);
    groups.get(key).push(item);
  }
  return groups;
}

// data + run solutions
const data = [{
  RouteId: "1",
  InDirection: "1",
  AreaCode: ["41108", "41109", "41110", "41111"],
}, {
  RouteId: "1",
  InDirection: "2",
  AreaCode: ["41108", "41109", "411011"],
}, {
  RouteId: "2",
  InDirection: "1",
  AreaCode: ["41112", "41114"],
}, {
  RouteId: "2",
  InDirection: "2",
  AreaCode: ["41112", "41114"],
}, {
  RouteId: "3",
  InDirection: "1",
  AreaCode: ["41112", "41114"],
}, {
  RouteId: "3",
  InDirection: "2",
  AreaCode: ["41112", "41114","41108", "41109", "41110"],
}, {
  RouteId: "4",
  InDirection: "1",
  AreaCode: ["41112", "41114","41108", "41110" , "41120", "41121"],
}, {
  RouteId: "4",
  InDirection: "2",
  AreaCode: ["41112", "41114"],
}];
solutionA(data);
solutionB(data);

Both chunkN() and groupBy() are generic helpers that can be used in lots of other scenarios. You can also achieve the same without helpers (like shown in the two codeblocks below). But defining these helpers separates the thing you want to do (finding the item with the most AreaCodes per RouteId) from the general purpose logic, like grouping or chunking.

const desired = [];
for (let index = 0; index < sampleData.length; index += 2) {
  const inDirection1 = sampleData[index];
  const inDirection2 = sampleData[index + 1];
  desired.push(
    inDirection1.AreaCode.length >= inDirection2.AreaCode.length
    ? inDirection1
    : inDirection2
  );
}
const groups = new Map();
for (const item of sampleData) {
  if (!groups.has(item.RouteId)) groups.set(item.RouteId, []);
  groups.get(item.RouteId).push(item);
}

const desired = Array.from(groups.values(), (group) => (
  // ascending = a - b, descending = b - a
  group.sort((a, b) => b.AreaCode.length - a.AreaCode.length)[0]
));
3limin4t0r
  • 19,353
  • 2
  • 31
  • 52
1

I would definitely break this into several steps. First, we need to sort the elements by the count of AreaCodes, then we need to take the top twenty, subject to the constraint that we can have only one per RouteId, and then, it seems, from your requested output, we should re-sort these according to their original order in the data (or according to their RouteId.) This last step seems really strange to me. If you're taking them according to an ordering, why wouldn't you keep that ordering when you shorten the list? In the code below, it's trivially easy to skip that step. Just remove the final sort call.

Here our sample only takes the first three, not the first twenty, as we don't have enough data for that.

const by = (fn, dir = 'ASCENDING') => (a, b, x = fn(a), y = fn(b)) => 
  (dir === 'ASCENDING' ? 1 : -1) * (x < y ? -1 : x > y ? 1 : 0)

const takeFirstOneOfEachUpTo = (n, fn) => ([x, ...xs], m = new Set(), k = fn(x)) =>
  x == undefined || n < 1 
    ? [] 
  : m.has(k)
    ? takeFirstOneOfEachUpTo(n, fn)(xs, m)
  : [x, ...takeFirstOneOfEachUpTo(n -1, fn)(xs, m.add(k))]


const process = (n) => (xs) => 
  takeFirstOneOfEachUpTo (n, x => x.RouteId) (
    [...xs].sort((by(x => x.AreaCode.length, 'DESCENDING')))
  ).sort(by(x => xs.indexOf(x))) // or .sort(by(x => x.RouteId)) // or skip altogether

const sampleData = [{RouteId: "1", InDirection: "1", AreaCode: ["41108", "41109", "41110", "41111"]}, {RouteId: "1", InDirection: "2", AreaCode: ["41108", "41109", "411011"]}, {RouteId: "2", InDirection: "1", AreaCode: ["41112", "41114"]}, {RouteId: "2", InDirection: "2", AreaCode: ["41112", "41114"]}, {RouteId: "3", InDirection: "1", AreaCode: ["41112", "41114"]}, {RouteId: "3", InDirection: "2", AreaCode: ["41112", "41114", "41108", "41109", "41110"]}, {RouteId: "4", InDirection: "1", AreaCode: ["41112", "41114", "41108", "41110", "41120", "41121"]}, {RouteId: "4", InDirection: "2", AreaCode: ["41112", "41114"]}]

console.log(process(3)(sampleData))
.as-console-wrapper {max-height: 100% !important; top: 0}

Here we have a utility function by, which makes comparators to pass to sort. It takes a function from your sort item to something sortable with < (such as a number, a string or a date), and optionally a direction (anything other than the default, 'ASCENDING' is taken to mean descending sort), and returns a function which will return -1, 0, or +1 when supplied two sort items. Feeding this into Array.prototype.sort, will sort these items according to the result of that function.

Then we have a helper function, takeFirstOneOfEachUpTo, which selects the first n items of its input, subject to the constraint that for the supplied function, we don't repeat inputs for which the function yields the same value. This is a fairly simple recursion, and it has the advantage over a reduce-based solution of stopping early. It's fine for numbers like 20. If you were to be collecting the top 1000, we might want a reduce-based solution instead, which doesn't have the recursion. This function maintains a Set of values its already seen and skips subsequent ones. That means that our function should probably generate a primitive value, or choose somehow from a fixed list of reference values. We're going to use it with the RouteId strings, so that's not a problem.

Our main function is process, named because I don't have enough context to give it a more meaningful one. It first sorts the inputs descending by the lengths of their AreaCode arrays, then it calls our helper function, passing it 20 (or 3 or however many we want to collect) and a function which gets the RouteId, thus grabbing the top 20 elements, but only one per RouteId. Finally -- in the step I find unnecessary -- it sorts the results according to their position in the original list.


This breakdown makes a great deal of sense to me. It is not the only way to approach the problem. Peter Seliger has raised an objection, based on an earlier question which used a less compact format to generate the input structure of this question. He has a point. Perhaps there's a useful way to generate this final format based on your original data.

In fact, there is technically an asymptotically more time-efficient manner of doing this. I know this because we are using a sort on your input data, meaning this is at minimum an O(n log(n)) technique. But collecting the maximum of a list, or the k-largest elements for a fixed k is known to be a linear problem, O(n). Theoretically, then, there is a faster solution, at least for extremely long lists. But unless you hit a performance wall with this, I would not bother. The techniques I know for k-largest require an amount of code that grows with the size of k, or one whose running time grows quadratically with k (still fixed with respect to n, but large enough to be annoyingly impractical for anything but extremely long lists.) There may be others I don't know that don't have these limitations, but again, I would search for them only with a demonstrated performance problem.

For this reason, if I were to solve this problem based on the original data, I would do it exactly as here, using the intermediate format you present as input to this question.

Peter's (sadly deleted) answer offers another approach. I hope he finishes it and restores it, as it offers an architecturally different way of approaching this, and uses your original data in an interesting way.

Scott Sauyet
  • 49,207
  • 4
  • 49
  • 103
  • 1
    Thanks you for taking care of and encouraging me. – Peter Seliger Jul 11 '23 at 18:33
  • I had to wait for a new day in order to give your functional approach the right amount of attention. Its plugged together skillfully, as always. – Peter Seliger Jul 12 '23 at 12:01
  • @Peter: I do want to find time to come back to this. I can think of ways to do this more efficiently if the input size is very large, and if "top 20" is not just the precursor to a *paginated* request. And my pretty `takeFirstOneOfEachUpTo ` (well, pretty except for the name -- that looks like one you might write! ;-) ) is probably a lot less performant than a `while` loop, although that's still for the moment dominated by the `sort`. But I probably won't find the time, especially as the OP doesn't seem to be around. – Scott Sauyet Jul 12 '23 at 14:06
1

The next provided approach takes into account following of the OP's sentences ...

"I want to sort above sampleData based on number of entries in AreaCodes and get top 20 results."

... and it further assumes that the OP wants to retrieve the top 20 unique RouteId data-items with each item featuring the largest possible AreaCode array. In addition, only one of the items which feature the same RouteId value but have each a distinct InDirection value, is allowed to be part of the "Top 20" (even in case the related but discarded item features a still larger AreaCode array than all other source data items).

Because the OP started mentioning in one of the comments ...

"Since in the real scenario sampleData is very large, is it possible ..."

... following approach has been chosen.

It seems to be unavoidable that one has to sort the provided source data array (sampleData) in its entirety. The sorting process already is the most expensive part. One does sort the array in descending order by any item's length value of the item's AreaCode array.

But one can not simply limit the sorted array to its first 20 items.

One has to iterate the array and has to make a distinction in between items that share the same RouteId value and the others. For the former, one has to implement a task which picks the item with the larger AreaCode array (or in case both are of equal length, according to the OP's requirements, the first occurring item ).

Since one needs to have a bit more control over the iteration, a while based loop has been chosen which allows the early exit in case of having reached the "Top 20" before having finished iterating the array entirely.

And in order to speed up the task which picks the correct item out of a pair of related same RouteId-value items, the approach introduces a Set based lookup where one, based on an item's RouteId, can check if the correct RouteId item already has been processed and collected into the result array.

The latter then can be sorted again ascending by each item's RouteId, or result could be simply left as is.

function getTopAmountOfUniqueRouteIdItemsOfLargestAreaCodeData(
  sourceData = [], topAmount = 20
) {
  // // in order to not mutate the passed `sourceData` reference.
  // sourceData = [...sourceData].sort( /* ... */ );

  // - the most expensive part of the approach.
  // - sort the provided data array in its entirety.
  sourceData
    // takes advantage of the `-` operator's type coercion.
    .sort((a, b) => b.AreaCode.length - a.AreaCode.length);

  // ... pushing data directly into the return value ..
  const result = [];
  // ... and accomplishing the necessary pick on a `Set`
  //     based lookup does speed up the rest of the task.
  const lookup = new Set;

  let collectionCount = 0;
  let item, idx = -1;

  while ((item = sourceData[++idx]) && (collectionCount < topAmount)) {
    // - keep iterating until no item has been left
    //   or until the top amount has been reached.
    const { RouteId } = item;

    if (!lookup.has(RouteId)) {
      lookup.add(RouteId);

      result.push(item);

      ++collectionCount;
    }
  }
  // ... either `result` as sorted return value ...
  return result.sort((a, b) => a.RouteId - b.RouteId);

  // // ... or, just returning `result` as is ...
  // return result;
}

const result =
  getTopAmountOfUniqueRouteIdItemsOfLargestAreaCodeData(sampleData);

console.log({ result });
.as-console-wrapper { min-height: 100%!important; top: 0; }
<script>
  const sampleData = [{
    RouteId: "1",
    InDirection: "1",
    AreaCode: ["41108", "41109", "41110", "41111"],
  }, {
    RouteId: "1",
    InDirection: "2",
    AreaCode: ["41108", "41109", "411011"],
  }, {
    RouteId: "2",
    InDirection: "1",
    AreaCode: ["41112", "41114"],
  }, {
    RouteId: "2",
    InDirection: "2",
    AreaCode: ["41112", "41114"],
  }, {
    RouteId: "3",
    InDirection: "1",
    AreaCode: ["41112", "41114"],
  }, {
    RouteId: "3",
    InDirection: "2",
    AreaCode: ["41112", "41114","41108", "41109", "41110"],
  }, {
    RouteId: "4",
    InDirection: "1",
    AreaCode: ["41112", "41114","41108", "41110" , "41120", "41121"],
  }, {
    RouteId: "4",
    InDirection: "2",
    AreaCode: ["41112", "41114"],
  }];
</script>
Peter Seliger
  • 11,747
  • 3
  • 28
  • 37
0

You want to make use of sort and filter functions for arrays, though there may be other ways of doing thins.

const sampleData = [{
  RouteId: "1",
  InDirection: "1",
  AreaCode: ["41108", "41109", "41110", "41111"],
}, {
  RouteId: "1",
  InDirection: "2",
  AreaCode: ["41108", "41109", "411011"],
}, {
  RouteId: "2",
  InDirection: "1",
  AreaCode: ["41112", "41114"],
}, {
  RouteId: "2",
  InDirection: "2",
  AreaCode: ["41112", "41114"],
}, {
  RouteId: "3",
  InDirection: "1",
  AreaCode: ["41112", "41114"],
}, {
  RouteId: "3",
  InDirection: "2",
  AreaCode: ["41112", "41114", "41108", "41109", "41110"],
}, {
  RouteId: "4",
  InDirection: "1",
  AreaCode: ["41112", "41114", "41108", "41110", "41120", "41121"],
}, {
  RouteId: "4",
  InDirection: "2",
  AreaCode: ["41112", "41114"],
}]


const routeIdsFound = []
const result = sampleData.sort(
  (a, b) => b.AreaCode.length - a.AreaCode.length
).filter(item => {
  if (!routeIdsFound.includes(item.RouteId)) {
    routeIdsFound.push(item.RouteId)
    return true
  } else {
    return false
  }
}).sort((a, b) => parseInt(a.RouteId) - parseInt(b.RouteId))

console.log(result);

You may also want to use something like Lodash, (if you're not opposed to using an external library), perhaps uniquBy might be something of interest (or sortedUinqBy ?)

Harrison
  • 1,654
  • 6
  • 11
  • 19
  • Since in the real scenario sampleData is very large, is it possible to limit it to 20 entires in sorted result. – Bonzo Jul 10 '23 at 14:20
  • Meaning you only sort the first 20 from whatever `sampleData` is? Or to only get the first 20 `RouteId`s? Alternatively, If you have lots of data, it may be important to look at how that data is being generated in the first place (such as database queries) to try and reduce the amount of data upstream – Harrison Jul 10 '23 at 14:49
  • 1
    @Bonzo ... again, you're changing requirements on the fly (worse, you provide them drop by drop). Especially for the processing of a vast amount of data one needs to know all the requirements in advance. Otherwise the chances of ending up with a solution that does waste resources and/or does need an unnecessary long processing time are increasingly high. – Peter Seliger Jul 10 '23 at 17:39
0

You can do this in steps

  • First, sort with Array#sort in descending order of most AreaCodes
  • Then, use Array#reduce to exclude any element whose RouteId already exists prior to the elements index

const 
      sampleData = [{ RouteId: "1", InDirection: "1", AreaCode: ["41108", "41109", "41110", "41111"], }, { RouteId: "1", InDirection: "2", AreaCode: ["41108", "41109", "411011"], }, { RouteId: "2", InDirection: "1", AreaCode: ["41112", "41114"], }, { RouteId: "2", InDirection: "2", AreaCode: ["41112", "41114"], }, { RouteId: "3", InDirection: "1", AreaCode: ["41112", "41114"], }, { RouteId: "3", InDirection: "2", AreaCode: ["41112", "41114","41108", "41109", "41110"], }, { RouteId: "4", InDirection: "1", AreaCode: ["41112", "41114","41108", "41110" , "41120", "41121"], }, { RouteId: "4", InDirection: "2", AreaCode: ["41112", "41114"], }],
      
      sortFiltered = sampleData.sort((a,b) => b.AreaCode.length - a.AreaCode.length).reduce(
          (filtered,cur) => filtered.find(a => a.RouteId === cur.RouteId) ? filtered : [...filtered,cur], []
      );
      
      
console.log( sortFiltered );
PeterKA
  • 24,158
  • 5
  • 26
  • 48