0

Clarification:

As some people have pointed out, this does look like a "Is this code ok"-kind of question. The main thing I'm actually curious about is this: how does the .hasOwnProperty method work?
I mean: IE's JScript engine (<9 at least) doesn't always use hash-tables, so I assume it does little more than iterate over all properties of that object, and then checks to see if it gets those properties from another object higher up the prototype-chain. Is this a fair assumption to make? After all: at some level every code gets translated into loops and branches, but if IE doesn't do hash-tables, doesn't that mean .hasOwnProperty is just some sugar, just there so you don't have to write the loop?
I think I got this notion from one of DC's blog-posts or video's and it could well be that he was talking about arrays, and the quirky things they are (or rather: can be) in JS. I can't find the video/blog-post(?) ATM. And seeing as JS Arrays are often abused, as many of you I think will agree, I thought the answers to this question could serve as a decent reference. That's why I didn't post it on codereview.

As far as current answers go, and since my question started of from the wrong angle (focussing on the code more than the mechanisms behind it), let me just be so corny as to thank you all for pointing out the issues I didn't think of.


Recently, I augmented the array prototype for a script I was working on (don't shoot, just yet). In order not to reinvent the wheel, I went and searched for some examples of how other people went about it.

Strangly This is quite common, which is plainly needlessly complex. I also found this, as an alternative, other than the poster being rather smug about his fast algorithm I still feel there is room for improvement.

I know I might come across as a smug wise-guy now, my unique method actually looks like this:

Array.prototype.unique = function()
{
    'use strict';
    var i,obj,ret;
    ret = [];
    obj = {};
    for (i=0;i<this.length;i++)
    {
        if (!obj.hasOwnProperty(this[i]))
        {
            ret.push(this[i]);
            obj[this[i]] = i;
        }
    }
    return ret;
};

This way, there just is no need for a second loop AFAIK, is there? Unless, of course, the hasOwnProperty method is dead slow, but somehow I doubt that, in this case: the chain can only go back 1 level, to the Object.prototype.

The second link I posted contains some statistics and speed-comparisons, but as we all know they mean next to nothing in the real world. Could anyone point me in the direction of good article on JS and benchmarking (apart from the one on John Resig's blog?

So, just out of curiosity: Does any one of you see a problem with this method? some more info: Object.prototype: unchanged, undefined is undefined, no globals no frameworks, and this method isn't blindly implemented (if (!Array.prototype.unique){...})

Elias Van Ootegem
  • 74,482
  • 9
  • 111
  • 149
  • Looks fine to me. I think this would be better posted in http://codereview.stackexchange.com/ as it doesn't really fit the Q&A format of stackoverflow. – jfriend00 Aug 20 '12 at 17:10
  • It only works with arrays containing primitives and even then you will have problems with type coercion. But the other "fast algorithm" has the same problem. – Felix Kling Aug 20 '12 at 17:14
  • In your algorithm, you end up doubling the amount of space required to run, compared to the first link you posted. – Waleed Khan Aug 20 '12 at 17:18
  • As addition to my comment, imagine the array `["true", true]`. The result will be `[true]`. Actually, your method is a bit better than the other one, since you are adding the original value to the final array and not the object key (which would result in all values being converted to strings) but it will still have the problem that for comparison, all values are converted to strings. – Felix Kling Aug 20 '12 at 17:21
  • @arxanas: You're right, but would one notice that on substantial arrays? The one (close to) valid point the guy in the second link made was that the first take is `N^2` (not entirely true, if you ask me, but it's his take is closer to `2N`. – Elias Van Ootegem Aug 20 '12 at 20:38

3 Answers3

4

Here is an implementation that considers type correctly, is much faster than the naive nested loops, and maintains the original array's order:

Array.prototype.unique = function(){
    var r, o, i, j, t, tt;
    r = [];
    o = {};
    for(i = 0; i < this.length; i++){
       t = this[i];
       tt = o[t] = o[t] || [];
       for(j = 0; j < tt.length; j++)
           if(tt[j] === this[i])
               break;
       if(j == tt.length)
           r.push(tt[j] = t);
     }
     return r;
}

I made a JSPerf to compare these implementations.

  • unique1 is the nested loops.
  • unique2 is the fast algorithm you linked to.
  • unique3 is your version.
  • unique4 is mine.
    Added
  • unique5 is Kooilnc's answer
  • unique6 is primvdb's answer

While unique2 is the fastest, it has the problem that it considers "1" and 1 as equal. unique4 comes in third for speed but is much faster than unique1 and gives correct output. All four variations actually give different output:

=> [1, "1", 1, 2, 3, 4, 1, 2, 3, "2", "3", "4", "true", "true", true].unique1()
// ["1", 4, 1, 2, 3, "2", "3", "4", "true", true]

=> [1, "1", 1, 2, 3, 4, 1, 2, 3, "2", "3", "4", "true", "true", true].unique2()
// [1, "2", "3", "4", true]

=> [1, "1", 1, 2, 3, 4, 1, 2, 3, "2", "3", "4", "true", "true", true].unique3()
// [1, 2, 3, 4, "true"]

=> [1, "1", 1, 2, 3, 4, 1, 2, 3, "2", "3", "4", "true", "true", true].unique4()
// [1, "1", 2, 3, 4, "2", "3", "4", "true", true]
Paul
  • 139,544
  • 27
  • 275
  • 264
  • This returns `[Object {yada="yada" }, Object { yada="yada" }]` for `[{yada:'yada'},{yada:'yada'}].unique()` (in the FF14.01-console) – KooiInc Aug 21 '12 at 07:00
  • @KooiInc That is because two object literals are unique by definition. Note the difference between `[{a: 'a'},{a: 'a'},{a: 'a'}]` which has three unique objects and `var a = {a: 'a'}; [a, a, a]` which has one object repeated three times. – Paul Aug 21 '12 at 15:05
  • @KooiInc If should be easy to modify the function to take in a comparator function if you want to support a custom equivalence definition. – Paul Aug 21 '12 at 15:06
3

Although not widely supported, you can use a Set (from ECMAScript Harmony). It's native so it shouldn't have that much of a performance impact (e.g. there is no need to look for the actual index like indexOf does). The main advantage is that you can seamlessly use it to keep track of which items you've already had, including objects, and you can take account for same objects:

Array.prototype.unique = function() {
    'use strict';
    var ret = [];
    var used = new Set();
    for (var i = 0; i < this.length; i++) {
        if (!used.has(this[i])) {
            ret.push(this[i]);
            used.add(this[i]);
        }
    }
    return ret;
};

var a = {};
var b = {};
[1, 2, 1, a, a, null, b].unique(); // [1, 2, a, null, b]
pimvdb
  • 151,816
  • 78
  • 307
  • 352
  • This returns `[Object {yada="yada" }, Object { yada="yada" }]` for `[{yada:'yada'},{yada:'yada'}].unique()` (tested in FF14.01) – KooiInc Aug 21 '12 at 06:58
  • @Kooilnc: Yes, those are not the same object but two separate ones. In my example `a` and `b` are also "equal" but only the duplicate `a` is removed. (`a !== b`) – pimvdb Aug 21 '12 at 09:37
  • It is a matter of interpretation I suppose. Programmatically the two are separate, but the content is exactly the same. The programmer has to choose for either one of them. I would say that content counts here, not the fact that the javascript interpreter evaluates the objects to be separate entities, but I suppose there can be arguments for the opposite. – KooiInc Aug 21 '12 at 09:53
  • @Kooilnc: Basically, it's a matter of defining "being equal" vs "being a duplicate", I guess. Checking whether two objects are equal is very fast and concise, whereas checking the actual contents is slow and tedious. I'm afraid your JSON solution isn't very reliable (`[{a:1,b:2},{b:2,a:1}].unique()`). – pimvdb Aug 21 '12 at 10:10
  • You're right. Everything has its price. Could be circumvented by sorting the keys of Object elements before stringifying them. I'll leave that to other geniuses. – KooiInc Aug 21 '12 at 10:51
0

Looks fine, but as Felix Kling noted it only works with arrays containing primitives. This one is a bit more complex, but works for all types I think:

Array.prototype.unique = function(){
  'use strict';
  var im = {}, uniq = [];
  for (var i=0;i<this.length;i++){
    var type = (this[i]).constructor.name, 
    //          ^note: for IE use this[i].constructor!
        val = type + (!/num|str|regex|bool/i.test(type) 
               ? JSON.stringify(this[i]) 
               : this[i]);
    if (!(val in im)){uniq.push(this[i]);}
    im[val] = 1;
  }
  return uniq;
}
//testing
var test = [1,2,'string',[1,2,3],'string','1','2',
            {yada:'yada'},/[a-z]/i,2,1,1,2,[1,2,3],
            false, new Date(),/[a-b]/i,5,/[a-z]/i,'false']
   ,testunique = test.unique();
//=> testunique now
//   [1,2,string,[1,2,3],1,2,[object Object],
//    /[a-z]/i,false,Mon Aug 20 2012 20:20:11 GMT+0200,
//    /[a-b]/i,5,false]
Community
  • 1
  • 1
KooiInc
  • 119,216
  • 31
  • 141
  • 177
  • @Kooilnc: Thx for sharing this very complete approach, I do have a question, though: how would this method perform on IE<9? I mean: `(this[i]).constructor.name`... I don't think this'll always work, (events, dom elements,...) I've written my own `Object.getPrototypeOf` method that can get to the constructors of these elements (even the Window constructor and Interfaces), and [posted a little segment of the code here](http://stackoverflow.com/questions/10617014/javascript-event-prototype-in-ie8). – Elias Van Ootegem Aug 21 '12 at 08:11
  • Hi @Elias, for IE<8 this wouldn't run without a JSON shim. I tested it in IE8/9, didn't notice a performance hit. Maybe a jsperf test would reveal, but currently I have no time to make a test. My unique extension *will* be slower in any browser I suppose, but that's the price for being able to consider all types, not only primitives – KooiInc Aug 21 '12 at 08:24
  • @Kooilnc: As always: can't have both, I'm keeping this in mind if ever I find myself in situations where the values are less predictable (currently it's user input, so all 1 dimension and all strings) – Elias Van Ootegem Aug 21 '12 at 08:33
  • I would suspect your `Object.getPrototypeOf` to be slower than getting the `constructor.name` directly, because of the matching. The latter will not work in IE, but just `constructor` would (edited my answer conformingly). – KooiInc Aug 21 '12 at 08:37
  • Yes, it's a lot slower, but as I said, it's only a part of the actual function I use, in some cases `constructor.name` is not available, and I just string it to `[object Window]` for example, and match the `Window` etc... anyway thanks for the feedback, I'll accept this answer as this last comment is actually part of my question and you're the only one who addressed it :) – Elias Van Ootegem Aug 21 '12 at 08:49
  • Thnx, glad I could be of assistance, and glad I could learn a thing or two on the way :) – KooiInc Aug 21 '12 at 08:59