2

What's the best way to search if a given string contains non UTF-8 characters in tcl? Is regexp'ing "^[\x00-\x7f]+$" the only way forward?

I'm trying to write a tcl proc to check if a given variable contains non UTF-8 characters and if it does replace it with "Not supported"

egorulz
  • 1,455
  • 2
  • 17
  • 28

1 Answers1

3

All Tcl's characters are Unicode characters.

OK, that's not helpful. You actually appear to be asking about non-ASCII characters. Supposing you wanted to replace each non-ASCII character with a ?, you might use a regular expression substitution, like this:

regsub -all {[\u0080-\uffff]} $inputString "?" outputString

The key here is that the RE is in braces (virtually always strongly recommended) and that we're using \uXXXX escape sequences (which the RE engine also understands). That'll put many ?s in potentially, but I'm sure you can adjust.

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215