There's nothing that can beat regular expressions for this kind of job. However, there are two problems with them - hard to maintain (what you pointed to in your post) and performance problems with very large ones. I don't know how many alternatives one single regexp can handle, but I guess up to 20-30 would be fine in any case.
So you need some code that builds regular expressions dynamically from some data structure, which can be an array or just a string. I personally would prefer the sting, because it's easiest to maintain.
// taken from http://www.ranks.nl/resources/stopwords.html
stops = ""
+"a about above after again against all am an and any are aren't as "
+"at be because been before being below between both but by can't "
+"cannot could couldn't did didn't do does doesn't doing don't down "
+"during each few for from further had hadn't has hasn't have "
+"haven't having he he'd he'll he's her here here's hers herself "
+"him himself his how how's i i'd i'll i'm i've if in into is isn't "
+"it it's its itself let's me more most mustn't my myself no nor "
+"not of off on once only or other ought our ours ourselves out "
+"over own same shan't she she'd she'll she's should shouldn't so "
+"some such than that that's the their theirs them themselves then "
+"there there's these they they'd they'll they're they've this "
+"those through to too under until up very was wasn't we we'd we'll "
+"we're we've were weren't what what's when when's where where's "
+"which while who who's whom why why's with won't would wouldn't "
+"you you'd you'll you're you've your yours yourself yourselves "
// how many to replace at a time
reSize = 20
// build regexps
regexes = []
stops = stops.match(/\S+/g).sort(function(a, b) { return b.length - a.length })
for (var n = 0; n < stops.length; n += reSize)
regexes.push(new RegExp("\\b(" + stops.slice(n, n + reSize).join("|") + ")\\b", "gi"));
Once you've got this, the rest is obvious:
regexes.forEach(function(r) {
text = text.replace(r, '')
})
You need to experiment with reSize
value to find out the optimal balance between the regex length and the total number of regexes. If the performace is critical, you can also run the generation part once and then cache the results (i.e. generated regexps) somewhere.