10

if I have a large javascript string array that has over 10,000 elements, how do I quickly search through it?

Right now I have a javascript string array that stores the description of a job, and I"m allowing the user to dynamic filter the returned list as they type into an input box.

So say I have an string array like so:
var descArr = {"flipping burgers", "pumping gas", "delivering mail"};

and the user wants to search for: "p"

How would I be able to search a string array that has 10000+ descriptions in it quickly? Obviously I can't sort the description array since they're descriptions, so binary search is out. And since the user can search by "p" or "pi" or any combination of letters, this partial search means that I can't use associative arrays (i.e. searchDescArray["pumping gas"] ) to speed up the search.

Any ideas anyone?

TriFu
  • 641
  • 3
  • 10
  • 19
  • 1
    Do you want to match the search at the beginnning of the strings or inside strings? If the user search for "p", should it include "flipping burgers" in the result? – Guffa Oct 20 '10 at 08:15
  • 1
    descArr is not an array but a literal object. – Q_Mlilo Oct 20 '10 at 08:18
  • @guffa, Yes, if the user searches for "p" it should include "flipping burgers" in the result. I find that the biggest slow down right now is the actual search. Currently I have a forloop that iterates over the array and does this comparison: if (descArray[i].search("P")) > -1){ //return result} – TriFu Oct 20 '10 at 08:59
  • 1
    Do it with RegExp - example: http://jsfiddle.net/RnabN/4/ (30k strings, max 100 results) – sod Oct 20 '10 at 12:29

6 Answers6

21

As regular expression engines in actual browsers are going nuts in terms of speed, how about doing it that way? Instead of an array pass a gigantic string and separate the words with an identifer. Example:

  • String "flipping burgers""pumping gas""delivering mail"
  • Regex: "([^"]*ping[^"]*)"

With the switch /g for global you get all the matches. Make sure the user does not search for your string separator.

You can even add an id into the string with something like:

  • String "11 flipping burgers""12 pumping gas""13 delivering mail"
  • Regex: "(\d+) ([^"]*ping[^"]*)"

  • Example: http://jsfiddle.net/RnabN/4/ (30000 strings, limit results to 100)

sod
  • 3,804
  • 5
  • 22
  • 28
  • The performance is not so much about modern browsers as it is about hardware and user habits. In real life, 2GB of RAM for the computer of an average user gives a different result when compared to a machine of a seasoned developer. IT people keep their computers in good shape. – Saul Oct 20 '10 at 11:03
  • Modern regular expression runtimes are nearly as fast as a precompiled c++ program. There are performance worlds between old javascript (firefox 2/netscape/internet explorer) and the new just-in-time implementations. JavaScript on an average pc with chrome runs multiple times faster then JavaScript on a highend pc with internet explorer. – sod Oct 20 '10 at 12:04
  • Just as a heads-up to future fiddlers. When modifying your jsfiddle example to also accept ID number, the number character must be double-escaped: new RegExp('"(\\d+) ([^"]*'+search+'[^"]*)"','gi') – Olav Kokovkin Nov 27 '14 at 10:42
  • Hi can you please tell me how to convert large array into 'gigantic string' in your plunker thank you, i will give one upvote if you help me, your answer is nice no doubt but converting array will also take some time can you pls provide demo? – Sudarshan Kalebere Jan 23 '18 at 18:35
  • @Sudarshan https://gist.github.com/sod/b05f36fc2de48621686fbcdbaad634db – sod Jan 24 '18 at 20:00
4

There's no way to speed up an initial array lookup without making some changes. You can speed up consequtive lookups by caching results and mapping them to patterns dynamically.

1.) Adjust your data format. This makes initial lookups somewhat speedier. Basically, you precache.

var data = {
    a : ['Ant farm', 'Ant massage parlor'],
    b : ['Bat farm', 'Bat massage parlor']
    // etc
}

2.) Setup cache mechanics.

var searchFor = function(str, list, caseSensitive, reduce){
    str = str.replace(/(?:^\s*|\s*$)/g, ''); // trim whitespace
    var found = [];
    var reg = new RegExp('^\\s?'+str, 'g' + caseSensitive ? '':'i');
    var i = list.length;
    while(i--){
        if(reg.test(list[i])) found.push(list[i]);
        reduce && list.splice(i, 1);
    }
}

var lookUp = function(str, caseSensitive){
    str = str.replace(/(?:^\s*|\s*$)/g, ''); // trim whitespace
    if(data[str]) return cache[str];
    var firstChar = caseSensitive ? str[0] : str[0].toLowerCase();
    var list = data[firstChar];
    if(!list) return (data[str] = []);
    // we cache on data since it's already a caching object.
    return (data[str] = searchFor(str, list, caseSensitive)); 
}

3.) Use the following script to create a precache object. I suggest you run this once and use JSON.stringify to create a static cache object. (or do this on the backend)

// we need lookUp function from above, this might take a while
var preCache = function(arr){
    var chars = "abcdefghijklmnopqrstuvwxyz".split('');
    var cache = {};
    var i = chars.length;
    while(i--){
        // reduce is true, so we're destroying the original list here.
        cache[chars[i]] = searchFor(chars[i], arr, false, true);
    }
    return cache;
}

Probably a bit more code then you expected, but optimalisation and performance doesn't come for free.

BGerrissen
  • 21,250
  • 3
  • 39
  • 40
  • I'm intrigued by this lookup mechanism. However, I'm confused as to why you only precache each letter? Does this method also cache entire words? Say you want to search for 'bat' in an array of 20,000 strings. – mesqueeb Feb 15 '18 at 11:07
1

This may not be an answer for you, as I'm making some assumptions about your setup, but if you have server side code and a database, you'd be far better off making an AJAX call back to get the cut down list of results, and using a database to do the filtering (as they're very good at this sort of thing).

As well as the database benefit, you'd also benefit from not outputting this much data (10000 variables) to a web based front end - if you only return those you require, then you'll save a fair bit of bandwidth.

Paddy
  • 33,309
  • 15
  • 79
  • 114
  • A database would need a fulltext index to be suitable for this job, that is not something databases implement by default, as it costs a lot of storage/memory. It still may be faster to use a normal database simply because it executes code significantly faster than the worst case browser IE6, but if it has to handle a lot of users then it must a specialized index. – aaaaaaaaaaaa Oct 20 '10 at 09:40
  • @eBusiness - A full text index would not be required for this. On SQL Server a query with WHERE title like 'P%' would still be SARGable and would use an index on this column if one were present. It's also faster because you are not transmitting all 10000 across the wire to the client prior to processing, only the cut down list. – Paddy Dec 05 '13 at 13:10
  • After thinking for 3 years, that is your best retort? %P% is not sargable, and I believe that is what the asker wanted. – aaaaaaaaaaaa Dec 05 '13 at 22:36
  • Funny, just got report of that comment yesterday... %p% is not sargable, but I believe p% is. Mind you when I read the question again, I see that I am wrong... – Paddy Dec 06 '13 at 09:56
1

I can't reproduce the problem, I created a naive implementation, and most browsers do the search across 10000 15 char strings in a single digit number of milliseconds. I can't test in IE6, but I wouldn't believe it to more than 100 times slower than the fastest browsers, which would still be virtually instant.

Try it yourself: http://ebusiness.hopto.org/test/stacktest8.htm (Note that the creation time is not relevant to the issue, that is just there to get some data to work on.)

One thing you could do wrong is trying to render all results, that would be quite a huge job when the user has only entered a single letter, or a common letter combination.

aaaaaaaaaaaa
  • 3,630
  • 1
  • 24
  • 23
0

I suggest trying a ready made JS function, for example the autocomplete from jQuery. It's fast and it has many options to configure.

Check out the jQuery autocomplete demo

mohdajami
  • 9,604
  • 3
  • 32
  • 53
  • Hi medopal, thanks for the suggestion but JQuery autocomplete is only fast when there are relatively few entries, when in the order of 10K+ it becomes slow as well – TriFu Oct 20 '10 at 08:49
0

Using a Set for large datasets (1M+) is around 3500 times faster than Array .includes()

You must use a Set if you want speed.

I just wrote a node script that needs to look up a string in a 1.3M array.

Using Array's .includes for 10K lookups: 39.27 seconds

Using Set .has for 10K lookups: 0.01084 seconds

Use a Set.

Vlad Lego
  • 1,670
  • 1
  • 18
  • 19