2

Background

I developed a jQuery-based shuttle widget for HTML select elements because I could not find one that was minimally codified and offered a regular expression filter that compensated for diacritics.

Problem

When a few thousand entries are added to the select, the regular expression filter slows to a crawl. You can see the problem as follows:

  1. Browse to: http://jsfiddle.net/U8Xre/2/
  2. Click the input field in the result panel.
  3. Type any regular expression (e.g., ^a.*ai).

Code

I believe the culprit is lurking here:

var options = $src.empty().scrollTop( 0 ).data( "options" );
var search = $.trim( $input.val() );
var regex = new RegExp( search, 'gi' );
var len = options.length;
var $html = $(document.createElement( 'option' ));
for( var i = 0; i < len; i++ ) {
  var o = options[ i ];
  if( o.text.dediacritics().match( regex ) !== null ) {
    $src.append( $html.clone().text( o.text ).val( o.value ) );
  }
}
$src.css( 'width', $input.width() + 4 );

Where $src is the source $('#select') and String.prototype.dediacritics is defined as in the fiddle. The code above runs for every single keypress. There is one more relevant snippet:

// Create a copy of the source options to use when matching the regex.
var $options = [];
$src.find( "option" ).each( function() {
  $options.push( { value: $(this).val(), text: $(this).text() } );
});
$src.data( "options", $options );

This makes a copy of the options from the source list, but only runs once. (This results in a duplication bug when shuttling options, but adding the above code into the input event handler would slow the filter even more.)

Question

How can the code be made to perform regular expression filtering on lists up to 5,000 words in nearly real-time?

Thank you!

Community
  • 1
  • 1
Dave Jarvis
  • 30,436
  • 41
  • 178
  • 315
  • Just never add thousands of entries to a ` – Bergi Nov 26 '12 at 00:49
  • 1
    This is for a maintenance screen used by a few people. Otherwise, they'd have to hand-categorize items directly in the database. I'm not looking for different UX solutions: everything works, albeit slowly. – Dave Jarvis Nov 26 '12 at 00:50
  • My guess is it isn't the regex that's the bottleneck. Can you narrow the bottleneck down a bit more? – Cameron Nov 26 '12 at 01:03
  • May I ask: Where did you find that fancy `dediacritics` function, is that maintained somewhere? – Bergi Dec 13 '12 at 23:51
  • @Bergi: http://stackoverflow.com/a/5912746/59087 ;-) – Dave Jarvis Dec 14 '12 at 05:04

3 Answers3

1

I suggest you to

  • create a multi-line string containing a list of all option names, each in a separate line
  • apply regex to this multi-line string to filter its content by removing non-matching lines
  • update html with matching lines as options of select element
Ωmega
  • 42,614
  • 34
  • 134
  • 203
1

I'd guess the harder job is the repeated call to dediacritics() (with its many regex-replaces) than doing the search (though I haven't done any profiling). So, you should cache these de-diacriticified strings and search only through them. Btw, test is usually faster than match.

Also, you should avoid as many DOM operations as possible - you have a lot of them when emptying and re-appending the whole options list onkeypress.

// once:
var options = [],
    src = $src[0]; // or whatever to get the DOM element
$.each( src.options, function() {
    options.push( { el: this, text: $(this).text().dediacritics(), hidden:false } );
});
// you might put it on the element via .data(), but need not

// onkeypress:
var regex = new RegExp( $.trim($input.val()), 'i' );
var curEl = src.firstChild;
for (var i=0; i<options.length; i++) {
    var option = options[i];
    if (regex.test( option.text )) {
        if (option.hidden)
            src.insertBefore(option.el, curEl);
        curEl = option.el.nextSibling;
        option.hidden = false;
    } else {
        if (!option.hidden) {
            curEl = option.el.nextSibling;
            src.removeChild(option.el);
        }
        option.hidden = true;
    }
}

Demo: This is blazingly fast ("realtime"), but you can feel the time it needs to construct the options array when calling dediacritics() 5000 times.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Very nice; might have to optimize this a bit more for fixing the shuttling bug. Looks super-fast. – Dave Jarvis Nov 26 '12 at 03:23
  • 1
    Right, I've missed that. I'd recommend just another binary property to store in which list the `option` currently is, and just `continue` the for-loop then. – Bergi Nov 26 '12 at 14:07
  • There is another bug: this will exclude every other item that matches the RegEx. If you had `Toy Box`, `Toy Story`, and `Toy Zebra`, then `^Toy` will not show `Toy Story` in the list. – Dave Jarvis Dec 10 '12 at 05:02
  • 1
    Thanks for the hint. Cause of the problem was the `lastIndex` flag on the (unnecessary) *global* regex, see http://stackoverflow.com/q/1520800/1048572. Fixed now. – Bergi Dec 10 '12 at 17:24
  • @ridgerunner: Removing the `global` flag (as I did) does also work – Bergi Dec 10 '12 at 18:08
  • @Bergi - quite right. (forgot about how the g flag affects this) I stand corrected. Thanks! – ridgerunner Dec 10 '12 at 20:47
1

A minor comment, if you aren't using the result of the regex match, then you should use a regex test:

  if( o.text.dediacritics().match( regex ) !== null ) {

use a test:

  if( regex.test(o.text.dediacritics()) ) {
Kernel James
  • 3,752
  • 25
  • 32
  • 1
    No, this answer is missing a crucial initialization and running it as-is will lead to erroneous behavior. Like `RegExp.exec()`, the `RegExp.test()` method uses the regexp instance object's `lastIndex` property to determine where to begin its search on any string and `test()` does _NOT_ reset this property after a successful match (it does reset it on match failure). Subsequent tests (on other strings) following a successful match will begin at a non-zero location - ERROR! To work correctly, this answer needs to add: `regex.lastIndex=0;` somewhere before: `regex.test()`. – ridgerunner Dec 10 '12 at 17:48
  • 1
    Additionally, as @Bergi correctly points out, you could alternately build the regex with the global `g` flag off - which also forces the reset of: `RegExp.lastIndex` each time it is run. – ridgerunner Dec 10 '12 at 20:52