-1

I want to find a way to get a symbol of a non-printable character in c# (e.g. "SOH" for start of heading and "BS" for backspace). Any ideas?

Edit: I don't need to visualize a byte value of a non-printable character but it's code as shown here https://web.itu.edu.tr/sgunduz/courses/mikroisl/ascii.html

Example would be "NUL" for 0x00, "SOH" for 0x01 etc.

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215

2 Answers2

2

You, probably, are looking for a kind of string dump in order to visualize control characters. You can do it with a help of Regular Expressions where \p{Cc} matches control symbol:

using Systen.Text.RegularExpressions;

...

string source = "BEL \u0007 then CR + LF  \r\n SOH \u0001 \0\0";

// To get control characters visible, we match them and
// replace with their codes
string result = Regex.Replace(
  source, @"\p{Cc}", 
  m => $"[Control: 0x{(int)m.Value[0]:x4}]");

// Let's have a look:

// Initial string 
Console.WriteLine(source);
Console.WriteLine();
// Control symbols visualized
Console.WriteLine(result);

Outcome:

BEL   then CR + LF  
 SOH  

BEL [Control: 0x0007] then CR + LF  [Control: 0x000d][Control: 0x000a] SOH [Control: 0x0001] [Control: 0x0000][Control: 0x0000]

Edit: If you want to visualize in a different way, you shoud edit lambda

m => $"[Control: 0x{(int)m.Value[0]:x4}]"

For instance:

    static string[] knownCodes = new string[] {
      "NULL", "SOH", "STX", "ETX", "EOT", "ENQ",
      "ACK",  "BEL", "BS", "HT", "LF", "VT",
      "FF", "CR", "SO", "SI", "DLE", "DC1", "DC2",
      "DC3", "DC4", "NAK", "SYN", "ETB", "CAN",
      "EM", "SUB", "ESC", "FS", "GS", "RS", "US",
    };

    private static string StringDump(string source) {
      if (null == source)
        return source;

      return Regex.Replace(
        source, 
       @"\p{Cc}", 
        m => {
          int code = (int)(m.Value[0]);

          return code < knownCodes.Length 
            ? $"[{knownCodes[code]}]" 
            : $"[Control 0x{code:x4}]";  
        });
    }

Demo:

Console.WriteLine(StringDump(source));

Outcome:

BEL [BEL] then CR + LF  [CR][LF] SOH [SOH] [NULL][NULL]
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • Thanks for an idea but this is just a visualized byte value of a character. I need an actual code of a character as shown here https://web.itu.edu.tr/sgunduz/courses/mikroisl/ascii.html – Alex Kovaliv Nov 19 '21 at 09:33
  • @Alex Kovaliv: if you want to visualize in different way, you should change the lambda. I've edited the answer – Dmitry Bychenko Nov 19 '21 at 09:46
0

In Visual Studio just displaying the SOH character (U+0001) for example and than encode it like this:

var bytes = Encoding.UTF8.GetBytes("☺");

And now you can do whatever you like with it. For Backspace use U+232B

D A
  • 1,724
  • 1
  • 8
  • 19
  • I don't need a byte value of a character. I want to know an actual character (code) of a non-printable character as shown in this table https://web.itu.edu.tr/sgunduz/courses/mikroisl/ascii.html – Alex Kovaliv Nov 19 '21 at 09:28