So this is pretty complex because no matter how you slice it you either have a ton of Unicode letters to include or a ton of Unicode special characters to exclude. What you essentially need here is a regex that will only allow characters from the Unicode general categories for letters (Lu, Ll, Lt, Lm, Lo).
In some regex flavors support for Unicode general categories is built in, and your regex would just be something like the following:
[\p{Ll}\p{Lu}\p{Lt}\p{Lm}\p{Lo}'\- _]
Unfortunately JavaScript does not support this, but you could do this with the Unicode addon to the XRegExp library, the usage would look something like this (for filtering out all of the characters you do not want):
XRegExp.replace(text, "[^\\p{Ll}\\p{Lu}\\p{Lt}\\p{Lm}\\p{Lo}'\\- _]", '', 'all');
Or alternatively if you want to construct a crazy long JavaScript regex that does the job, the CSET JavaScript library can be used, here is the regex I came up with:
var regex = /[\u0000-\u001f!-&(-,.-@[-^`{-©«-´¶-¹»-¿×÷˂-˅˒-˟˥-˫˭˯-\u036f͵\u0378-\u0379;-΅·\u038b\u038d\u03a2϶҂-\u0489\u0524-\u0530\u0557-\u0558՚-\u0560\u0588-\u05cf\u05eb-\u05ef׳-\u0620\u064b-٭\u0670۔\u06d6-\u06e4\u06e7-\u06ed۰-۹۽-۾܀-\u070f\u0711\u0730-\u074c\u07a6-\u07b0\u07b2-߉\u07eb-\u07f3߶-߹\u07fb-\u0903\u093a-\u093c\u093e-\u094f\u0951-\u0957\u0962-॰\u0973-\u097a\u0980-\u0984\u098d-\u098e\u0991-\u0992\u09a9\u09b1\u09b3-\u09b5\u09ba-\u09bc\u09be-\u09cd\u09cf-\u09db\u09de\u09e2-৯৲-\u0a04\u0a0b-\u0a0e\u0a11-\u0a12\u0a29\u0a31\u0a34\u0a37\u0a3a-\u0a58\u0a5d\u0a5f-\u0a71\u0a75-\u0a84\u0a8e\u0a92\u0aa9\u0ab1\u0ab4\u0aba-\u0abc\u0abe-\u0acf\u0ad1-\u0adf\u0ae2-\u0b04\u0b0d-\u0b0e\u0b11-\u0b12\u0b29\u0b31\u0b34\u0b3a-\u0b3c\u0b3e-\u0b5b\u0b5e\u0b62-୰\u0b72-\u0b82\u0b84\u0b8b-\u0b8d\u0b91\u0b96-\u0b98\u0b9b\u0b9d\u0ba0-\u0ba2\u0ba5-\u0ba7\u0bab-\u0bad\u0bba-\u0bcf\u0bd1-\u0c04\u0c0d\u0c11\u0c29\u0c34\u0c3a-\u0c3c\u0c3e-\u0c57\u0c5a-\u0c5f\u0c62-\u0c84\u0c8d\u0c91\u0ca9\u0cb4\u0cba-\u0cbc\u0cbe-\u0cdd\u0cdf\u0ce2-\u0d04\u0d0d\u0d11\u0d29\u0d3a-\u0d3c\u0d3e-\u0d5f\u0d62-൹\u0d80-\u0d84\u0d97-\u0d99\u0db2\u0dbc\u0dbe-\u0dbf\u0dc7-\u0e00\u0e31\u0e34-฿\u0e47-\u0e80\u0e83\u0e85-\u0e86\u0e89\u0e8b-\u0e8c\u0e8e-\u0e93\u0e98\u0ea0\u0ea4\u0ea6\u0ea8-\u0ea9\u0eac\u0eb1\u0eb4-\u0ebc\u0ebe-\u0ebf\u0ec5\u0ec7-\u0edb\u0ede-\u0eff༁-\u0f3f\u0f48\u0f6d-\u0f87\u0f8c-\u0fff\u102b-\u103e၀-၏\u1056-\u1059\u105e-\u1060\u1062-\u1064\u1067-\u106d\u1071-\u1074\u1082-\u108d\u108f-႟\u10c6-\u10cf჻\u10fd-\u10ff\u115a-\u115e\u11a3-\u11a7\u11fa-\u11ff\u1249\u124e-\u124f\u1257\u1259\u125e-\u125f\u1289\u128e-\u128f\u12b1\u12b6-\u12b7\u12bf\u12c1\u12c6-\u12c7\u12d7\u1311\u1316-\u1317\u135b-\u137f᎐-\u139f\u13f5-\u1400᙭-᙮\u1677-\u1680᚛-\u169f᛫-\u16ff\u170d\u1712-\u171f\u1732-\u173f\u1752-\u175f\u176d\u1771-\u177f\u17b4-៖៘-៛\u17dd-\u181f\u1878-\u187f\u18a9\u18ab-\u18ff\u191d-᥏\u196e-\u196f\u1975-\u197f\u19aa-\u19c0\u19c8-᧿\u1a17-\u1b04\u1b34-\u1b44\u1b4c-\u1b82\u1ba1-\u1bad᮰-\u1bff\u1c24-\u1c4c᱐-᱙᱾-\u1cff\u1dc0-\u1dff\u1f16-\u1f17\u1f1e-\u1f1f\u1f46-\u1f47\u1f4e-\u1f4f\u1f58\u1f5a\u1f5c\u1f5e\u1f7e-\u1f7f\u1fb5᾽᾿-῁\u1fc5῍-῏\u1fd4-\u1fd5\u1fdc-῟῭-\u1ff1\u1ff5´-⁰\u2072-⁾₀-\u208f\u2095-℁℃-℆℈-℉℔№-℘℞-℣℥℧℩℮℺-℻⅀-⅄⅊-⅍⅏-\u2182\u2185-\u2bff\u2c2f\u2c5f\u2c70\u2c7e-\u2c7f⳥-⳿\u2d26-\u2d2f\u2d66-\u2d6e\u2d70-\u2d7f\u2d97-\u2d9f\u2da7\u2daf\u2db7\u2dbf\u2dc7\u2dcf\u2dd7\u2ddf-⸮⸰-〄\u3007-〰〶-\u303a〽-\u3040\u3097-゜゠・\u3100-\u3104\u312e-\u3130\u318f-㆟\u31b8-\u31ef㈀-㏿\u4db6-䷿\u9fc4-\u9fff\ua48d-\ua4ff꘍-꘏꘠-꘩\ua62c-\ua63f\ua660-\ua661\ua66f-꙾\ua698-꜖꜠-꜡꞉-꞊\ua78d-\ua7fa\ua802\ua806\ua80b\ua823-\ua83f꡴-\ua881\ua8b4-꤉\ua926-꤯\ua947-\ua9ff\uaa29-\uaa3f\uaa43\uaa4c-\uabff\ud7a4-\ud7ff\ud840-\ud868\udc00-\uf8ff\ufa2e-\ufa2f\ufa6b-\ufa6f\ufada-\ufaff\ufb07-\ufb12\ufb18-\ufb1c\ufb1e﬩\ufb37\ufb3d\ufb3f\ufb42\ufb45\ufbb2-\ufbd2﴾-\ufd4f\ufd90-\ufd91\ufdc8-\ufdef﷼-\ufe6f\ufe75\ufefd-@[-`{-・\uffbf-\uffc1\uffc8-\uffc9\uffd0-\uffd1\uffd8-\uffd9\uffdd-\uffff]|[\ud803-\ud807\ud809-\ud834\ud836-\ud83f\ud86a-\ud87d\ud87f-\udbff][\udc00-\udfff]|\ud800[\udc0c\udc27\udc3b\udc3e\udc4e-\udc4f\udc5e-\udc7f\udcfb-\ude7f\ude9d-\ude9f\uded1-\udeff\udf1f-\udf2f\udf41\udf4a-\udf7f\udf9e-\udf9f\udfc4-\udfc7\udfd0-\udfff]|\ud801[\udc9e-\udfff]|\ud802[\udc06-\udc07\udc09\udc36\udc39-\udc3b\udc3d-\udc3e\udc40-\udcff\udd16-\udd1f\udd3a-\uddff\ude01-\ude0f\ude14\ude18\ude34-\udfff]|\ud808[\udf6f-\udfff]|\ud835[\udc55\udc9d\udca0-\udca1\udca3-\udca4\udca7-\udca8\udcad\udcba\udcbc\udcc4\udd06\udd0b-\udd0c\udd15\udd1d\udd3a\udd3f\udd45\udd47-\udd49\udd51\udea6-\udea7\udec1\udedb\udefb\udf15\udf35\udf4f\udf6f\udf89\udfa9\udfc3\udfcc-\udfff]|\ud869[\uded7-\udfff]|\ud87e[\ude1e-\udfff]|[\ud800-\ud83f\ud869-\udbff]/g;
And the steps to get there (after including the CSET source):
CSET.import();
var allUnicodeLetters = ['Lu', 'Ll', 'Lt', 'Lm', 'Lo'].map(fromUnicodeGeneralCategory).reduce(union);
var allAllowedCharacters = union(allUnicodeLetters, fromString("'- _"));
var regex = new RegExp(toRegex(complement(allAllowedCharacters)), 'g');
Then you could use str = str.replace(regex, '')
and it would remove all special characters except for the ones you want to allow including symbols like dingbats.
Edit: Just realized you may also want to allow numbers, if so you could use the following, which was obtained by adding 'Nd'
and 'Nl'
in the method above:
var regex = /[\u0000-\u001f!-&(-,.-/:-@[-^`{-©«-´¶-¹»-¿×÷˂-˅˒-˟˥-˫˭˯-\u036f͵\u0378-\u0379;-΅·\u038b\u038d\u03a2϶҂-\u0489\u0524-\u0530\u0557-\u0558՚-\u0560\u0588-\u05cf\u05eb-\u05ef׳-\u0620\u064b-\u065f٪-٭\u0670۔\u06d6-\u06e4\u06e7-\u06ed۽-۾܀-\u070f\u0711\u0730-\u074c\u07a6-\u07b0\u07b2-\u07bf\u07eb-\u07f3߶-߹\u07fb-\u0903\u093a-\u093c\u093e-\u094f\u0951-\u0957\u0962-॥॰\u0973-\u097a\u0980-\u0984\u098d-\u098e\u0991-\u0992\u09a9\u09b1\u09b3-\u09b5\u09ba-\u09bc\u09be-\u09cd\u09cf-\u09db\u09de\u09e2-\u09e5৲-\u0a04\u0a0b-\u0a0e\u0a11-\u0a12\u0a29\u0a31\u0a34\u0a37\u0a3a-\u0a58\u0a5d\u0a5f-\u0a65\u0a70-\u0a71\u0a75-\u0a84\u0a8e\u0a92\u0aa9\u0ab1\u0ab4\u0aba-\u0abc\u0abe-\u0acf\u0ad1-\u0adf\u0ae2-\u0ae5\u0af0-\u0b04\u0b0d-\u0b0e\u0b11-\u0b12\u0b29\u0b31\u0b34\u0b3a-\u0b3c\u0b3e-\u0b5b\u0b5e\u0b62-\u0b65୰\u0b72-\u0b82\u0b84\u0b8b-\u0b8d\u0b91\u0b96-\u0b98\u0b9b\u0b9d\u0ba0-\u0ba2\u0ba5-\u0ba7\u0bab-\u0bad\u0bba-\u0bcf\u0bd1-\u0be5௰-\u0c04\u0c0d\u0c11\u0c29\u0c34\u0c3a-\u0c3c\u0c3e-\u0c57\u0c5a-\u0c5f\u0c62-\u0c65\u0c70-\u0c84\u0c8d\u0c91\u0ca9\u0cb4\u0cba-\u0cbc\u0cbe-\u0cdd\u0cdf\u0ce2-\u0ce5\u0cf0-\u0d04\u0d0d\u0d11\u0d29\u0d3a-\u0d3c\u0d3e-\u0d5f\u0d62-\u0d65൰-൹\u0d80-\u0d84\u0d97-\u0d99\u0db2\u0dbc\u0dbe-\u0dbf\u0dc7-\u0e00\u0e31\u0e34-฿\u0e47-๏๚-\u0e80\u0e83\u0e85-\u0e86\u0e89\u0e8b-\u0e8c\u0e8e-\u0e93\u0e98\u0ea0\u0ea4\u0ea6\u0ea8-\u0ea9\u0eac\u0eb1\u0eb4-\u0ebc\u0ebe-\u0ebf\u0ec5\u0ec7-\u0ecf\u0eda-\u0edb\u0ede-\u0eff༁-༟༪-\u0f3f\u0f48\u0f6d-\u0f87\u0f8c-\u0fff\u102b-\u103e၊-၏\u1056-\u1059\u105e-\u1060\u1062-\u1064\u1067-\u106d\u1071-\u1074\u1082-\u108d\u108f\u109a-႟\u10c6-\u10cf჻\u10fd-\u10ff\u115a-\u115e\u11a3-\u11a7\u11fa-\u11ff\u1249\u124e-\u124f\u1257\u1259\u125e-\u125f\u1289\u128e-\u128f\u12b1\u12b6-\u12b7\u12bf\u12c1\u12c6-\u12c7\u12d7\u1311\u1316-\u1317\u135b-\u137f᎐-\u139f\u13f5-\u1400᙭-᙮\u1677-\u1680᚛-\u169f᛫-᛭\u16f1-\u16ff\u170d\u1712-\u171f\u1732-\u173f\u1752-\u175f\u176d\u1771-\u177f\u17b4-៖៘-៛\u17dd-\u17df\u17ea-\u180f\u181a-\u181f\u1878-\u187f\u18a9\u18ab-\u18ff\u191d-᥅\u196e-\u196f\u1975-\u197f\u19aa-\u19c0\u19c8-\u19cf\u19da-᧿\u1a17-\u1b04\u1b34-\u1b44\u1b4c-\u1b4f᭚-\u1b82\u1ba1-\u1bad\u1bba-\u1bff\u1c24-᰿\u1c4a-\u1c4c᱾-\u1cff\u1dc0-\u1dff\u1f16-\u1f17\u1f1e-\u1f1f\u1f46-\u1f47\u1f4e-\u1f4f\u1f58\u1f5a\u1f5c\u1f5e\u1f7e-\u1f7f\u1fb5᾽᾿-῁\u1fc5῍-῏\u1fd4-\u1fd5\u1fdc-῟῭-\u1ff1\u1ff5´-⁰\u2072-⁾₀-\u208f\u2095-℁℃-℆℈-℉℔№-℘℞-℣℥℧℩℮℺-℻⅀-⅄⅊-⅍⅏-⅟\u2189-\u2bff\u2c2f\u2c5f\u2c70\u2c7e-\u2c7f⳥-⳿\u2d26-\u2d2f\u2d66-\u2d6e\u2d70-\u2d7f\u2d97-\u2d9f\u2da7\u2daf\u2db7\u2dbf\u2dc7\u2dcf\u2dd7\u2ddf-⸮⸰-〄〈-〠\u302a-〰〶-〷〽-\u3040\u3097-゜゠・\u3100-\u3104\u312e-\u3130\u318f-㆟\u31b8-\u31ef㈀-㏿\u4db6-䷿\u9fc4-\u9fff\ua48d-\ua4ff꘍-꘏\ua62c-\ua63f\ua660-\ua661\ua66f-꙾\ua698-꜖꜠-꜡꞉-꞊\ua78d-\ua7fa\ua802\ua806\ua80b\ua823-\ua83f꡴-\ua881\ua8b4-꣏\ua8da-\ua8ff\ua926-꤯\ua947-\ua9ff\uaa29-\uaa3f\uaa43\uaa4c-\uaa4f\uaa5a-\uabff\ud7a4-\ud7ff\ud840-\ud868\udc00-\uf8ff\ufa2e-\ufa2f\ufa6b-\ufa6f\ufada-\ufaff\ufb07-\ufb12\ufb18-\ufb1c\ufb1e﬩\ufb37\ufb3d\ufb3f\ufb42\ufb45\ufbb2-\ufbd2﴾-\ufd4f\ufd90-\ufd91\ufdc8-\ufdef﷼-\ufe6f\ufe75\ufefd-/:-@[-`{-・\uffbf-\uffc1\uffc8-\uffc9\uffd0-\uffd1\uffd8-\uffd9\uffdd-\uffff]|[\ud803-\ud807\ud80a-\ud834\ud836-\ud83f\ud86a-\ud87d\ud87f-\udbff][\udc00-\udfff]|\ud800[\udc0c\udc27\udc3b\udc3e\udc4e-\udc4f\udc5e-\udc7f\udcfb-\udd3f\udd75-\ude7f\ude9d-\ude9f\uded1-\udeff\udf1f-\udf2f\udf4b-\udf7f\udf9e-\udf9f\udfc4-\udfc7\udfd0\udfd6-\udfff]|\ud801[\udc9e-\udc9f\udcaa-\udfff]|\ud802[\udc06-\udc07\udc09\udc36\udc39-\udc3b\udc3d-\udc3e\udc40-\udcff\udd16-\udd1f\udd3a-\uddff\ude01-\ude0f\ude14\ude18\ude34-\udfff]|\ud808[\udf6f-\udfff]|\ud809[\udc63-\udfff]|\ud835[\udc55\udc9d\udca0-\udca1\udca3-\udca4\udca7-\udca8\udcad\udcba\udcbc\udcc4\udd06\udd0b-\udd0c\udd15\udd1d\udd3a\udd3f\udd45\udd47-\udd49\udd51\udea6-\udea7\udec1\udedb\udefb\udf15\udf35\udf4f\udf6f\udf89\udfa9\udfc3\udfcc-\udfcd]|\ud869[\uded7-\udfff]|\ud87e[\ude1e-\udfff]|[\ud800-\ud83f\ud869-\udbff]/g;