0

Is there some way of looping through all the known characters, filtering out the letters & numbers, then pushing them to an array or string?

Something like…

const myChars = [some method to generate every QWERTY character I could type] ‍♂️
let regex = new RegExp[anything not a letter or number])
const aRaw = myChars.split("")
const clean = aRaw.filter((e) => e.match(regex))
return clean // leaving me with a master array of all special characters! Muhhahahaha

Note! I do not want to REMOVE these characters from a string. I want to "automagically" generate them. Not "randomly gen characters" from a given string, either.

Selino
  • 111
  • 3
  • 12
  • 2
    What do you mean by "known characters", exactly? If you want letters and numbers, that's `/[a-zA-Z\d]/`. What do you mean by "automagically generating" them? It'd be helpful if you could provide expected output given some input. – ggorlen Jul 08 '20 at 18:54
  • I want the opposite of what you wrote, any letters and numbers. I want an array with every character that can be typed into a text area on a Qwerty keyboard that IS NOT a letter or number… without typing them. – Selino Jul 08 '20 at 18:58
  • `/[^a-zA-Z\d]/` then. Re-reading your question, you want every conceivable character that isn't a letter or number? If so, ASCII only or other character sets as well? – ggorlen Jul 08 '20 at 18:58
  • I think that what you're writing is a regex used to filter. It does not generate the characters. I would have to type them and then run the regex. That's not what I want. I want an algorithm or method that hands it to me… like a delicious dream of laziness. Can I use a regex in a loop to write characters? – Selino Jul 08 '20 at 19:01
  • 2
    There are a lot of characters, please specify which ones you want. You want all Turkish, Russian, Chinese, Japanese etc characters in the entire Unicode set? Just ASCII? Just printable characters? It's totally unclear what "every character I could type" entails.. This very much feels like an [xy problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Describing what the point of this is might help offer a better solution to whatever you mean to achieve by this. – ggorlen Jul 08 '20 at 19:02
  • 1
    I think you really don't know about charsets. The number of "conceivable characters" is HUGE. We're talking many thousand caracters in all world's charsets, probably many dozen thousand chars. AFAIU, to generate a string with "every conceivable charset" is not possible due to differences in worldwide charsets. – Nelson Teixeira Jul 08 '20 at 19:09

2 Answers2

1

If you just care about a small restricted subset of possible symbols (e.g. utf 16), your outlined approach is right on and easy to fill in:

let utf16Chars = [...Array(0xFFFF)].map((_, i) => String.fromCharCode(i));
let alphaNumeric = /[a-zA-Z0-9]/;
let symbols = utf16Chars.filter(e => !e.match(alphaNumeric));

console.log(symbols.length);
console.log(symbols.slice(0,100));

I'll be honest, I know emberassingly little about the terminologies UTF, ASCII, Unicode... etc. So forgive me if I'm mistaken. But this SO post suggests UTF 16 could be what you're looking for, and this mdn post suggests 0-0xffff should get you the UTF chars.

junvar
  • 11,151
  • 2
  • 30
  • 46
0

For the record, if you're going to use the list from @junvar, don't try to use it inside a regex. It won't work as the code below shows. Try to do something similiar to the forEach below the regex. Notice it takes a long time to run. Better not put this in anything that needs speed.

let utf16Chars = [...Array(0xFFFF)].map((_, i) => String.fromCharCode(i));
let alphaNumeric = /[\w\d]/;
let symbols = utf16Chars.filter(e => !e.match(alphaNumeric));

let ixSlash = symbols.findIndex(s=>s=='\\');
symbols[ixSlash] = '\\\\';

let rgxStr = '[' + symbols.join('') + ']';

let regEx = new RegExp(rgxStr, 'g');

let str = '(NT*B&TIUSDSJHni7*B&TY*&';
console.log('Regex match:', str.match(regEx)); //doesn't work

symbols.forEach(s=>{
  let ix = str.indexOf(s);
  if (ix >= 0) console.log('Match at position ' + ix + ' (' + s + ')');
})
Nelson Teixeira
  • 6,297
  • 5
  • 36
  • 73