2

Suppose I have a list like this:

[ 2, 7, 2, 3, 1, 1, 4, 5, 3, 6, 4 ]

And I want to sort and remove duplicates to yield:

[ 1, 2, 3, 4, 5, 6, 7 ]

I can achieve this by removing duplicates and then sorting:

const uniqueAndSorted = xs => [ ...new Set(xs) ].sort();

However, this seems inefficient, since I could probably detect duplicates as I do the sorting.

What is the optimal way to sort and remove duplicates from a list?

(JavaScript implementations are preferred; the function should be non-destructive)

sdgfsdh
  • 33,689
  • 26
  • 132
  • 245
  • what do you mean by *non-destructive*? do you wnat to keep the original array without duplicates and sorted? your given code does not match this. – Nina Scholz Aug 31 '18 at 09:52
  • The input array (`xs` here) should not be changed – sdgfsdh Aug 31 '18 at 09:54
  • https://stackoverflow.com/questions/1344500/efficient-way-to-insert-a-number-into-a-sorted-array-of-numbers – Roberto Zvjerković Aug 31 '18 at 09:58
  • 1
    I say, leave as much work to the engine if possible. Using stock functions and going low-level is usually better than writing custom algorithm in a high-level language, if both are possible - better in both readability and in performance. I'd take your `uniqueAndSorted` over writing a custom "do both at once" in JavaScript any day of the week. – Amadan Aug 31 '18 at 10:10
  • I don’t think you can get any better than filtering and then sorting. At least regarding big-O complexity. – algrid Aug 31 '18 at 13:33
  • @algrid big-O complexity is not everything. For example, these have the same big-O: `xs.map(x => x + 1).map(x => x * 2)` and `xs.map(x => (x + 1) * 2)`, but the latter will be faster in real usage. – sdgfsdh Aug 31 '18 at 13:37
  • 1
    @algrid "I don't think you can get any better than filtering and then sorting." That depends on how you define "better." Filter and sort requires that you allocate a dictionary. But if you sort, then filter, you can do it without a dictionary. In practice, which is faster will depend on the percentage of duplicates. If there are many (but I don't know if "many" means 2 times or 10 times) duplicates, then filtering first will probably be faster. – Jim Mischel Aug 31 '18 at 15:30
  • 1
    Do you really want the "most efficient" way? How do you define efficiency? Speed? Memory usage? Would you prefer a simple, straightforward method, or would you go for a very complex and fragile solution if it saved you a couple of microseconds? How many items are in the array? Concentrate on making something that works. Don't worry about speed unless what you come up with is too slow. – Jim Mischel Aug 31 '18 at 15:34

5 Answers5

2

I am not sure if this works with all browsers, but you could do the following:

At least in Chrome it works:

function getSortedSetArray(arr) {
  var map = {};

  arr.forEach(function (elem) {
    map[elem] = true;
  })

  return Object.keys(map);
} 
JanS
  • 2,065
  • 3
  • 27
  • 29
1

You could achieve this by doing ES6 Set.

For example:

const uniqueAndSorted = xs => Array.from(new Set(xs)).sort();

uniqueAndSorted([ 2, 7, 2, 3, 1, 1, 4, 5, 3, 6, 4 ]) should return [1, 2, 3, 4, 5, 6, 7]

Sovalina
  • 5,410
  • 4
  • 22
  • 39
Dilip
  • 11
  • 2
1

It depends on the amount of duplicates that you have. It therr are few duplicates then sorting first and then remove is faster. On the other hand if you have lots of duplicates then create a hash set first and then sort is the best option.

Sources: What's the most efficient way to erase duplicates and sort a vector?

https://www.geeksforgeeks.org/how-to-sort-a-big-array-with-many-repetitions/

Other option is to use a "fat-pivot quicksort" or "ternary-split quicksort" which is faster than quicksort when the input has many duplicates:

https://www.toptal.com/developers/sorting-algorithms/quick-sort-3-way

0

This works, but benchmarking a few methods would be best:

function uniq_sort(a) {
    var seen = {};
    return a.filter(function(item) {
        return seen.hasOwnProperty(item) ? false : (seen[item] = true);
    }).sort();
}
Davesoft
  • 724
  • 1
  • 4
  • 10
0
var myData = [ 2, 7, 2, 3, 1, 1, 4, 5, 3, 6, 4 ];

myData.reduce((x, y) => x.includes(y) ? x : [...x, y], []).sort() 
Jordi Jordi
  • 461
  • 3
  • 10