TL:DR; Printable characters inside the ASCII range (0-127) will be show
n as graphic characters.* Everything else will be escaped.
* Except for double quotes (as they're used for string delimiters) and
backslashes (because they're needed for escaping).
Let's have a look at the source code to figure this one out!
Since we have String = [Char]
, we should hunt for instance Show Char
in
the source. It can be found
here.
It is defined as:
-- | @since 2.01
instance Show Char where
showsPrec _ '\'' = showString "'\\''"
showsPrec _ c = showChar '\'' . showLitChar c . showChar '\''
showList cs = showChar '"' . showLitString cs . showChar '"'
So showing a String
(using showList
) is basically a wrapper around
ShowLitString
, and showing a Char
is a wrapper around ShowLitChar
.
Let's look at those functions.
showLitString :: String -> ShowS
-- | Same as 'showLitChar', but for strings
-- It converts the string to a string using Haskell escape conventions
-- for non-printable characters. Does not add double-quotes around the
-- whole thing; the caller should do that.
-- The main difference from showLitChar (apart from the fact that the
-- argument is a string not a list) is that we must escape double-quotes
showLitString [] s = s
showLitString ('"' : cs) s = showString "\\\"" (showLitString cs s)
showLitString (c : cs) s = showLitChar c (showLitString cs s)
-- [explanatory comments ...]
As you might've expected, showLitString
is mostly a wrapper around
showLitChar
.
[Note: If you're unfamiliar with the ShowS
type, this is a good
answer to understand why
it might be useful.]
Not quite what we were looking for, so let us go to showLitChar
(I've
omitted parts of the definition which aren't relevant to the question).
-- | Convert a character to a string using only printable characters,
-- using Haskell source-language escape conventions. For example:
-- [...]
showLitChar :: Char -> ShowS
showLitChar c s | c > '\DEL' = showChar '\\' (protectEsc isDec (shows (ord c)) s)
-- ^ Pattern matched for cat
showLitChar '\DEL' s = showString "\\DEL" s
showLitChar '\\' s = showString "\\\\" s
-- ^ Pattern matched for backslash
showLitChar c s | c >= ' ' = showChar c s
-- ^ Pattern matched for d
-- Some more escape codes
showLitChar '\a' s = showString "\\a" s
-- similarly for '\b', '\f', '\n', '\r', '\t', '\v' etc.
-- showLitChar ... = ...
Now you see where the problem is. ord c
is an int
, and the first is taken
for all non-ASCII characters (ord '\DEL' == 127
).
For characters in the ASCII range, the printable characters are printed and
the rest are escaped. For characters outside it, all of them are escaped.
The code doesn't answer the "why" part of the question. The answer to that
(I think) is in the very first comment that we saw:
-- | @since 2.01
instance Show Char where
If I were guessing, this behaviour has been kept around for maintain backwards
compatibility. I don't need to guess: see the comments for some good answers to this.
Bonus
We can do a git blame
online using GHC's Github mirror ;). Let's see
when this code was written
(blame link).
The relevant commit is 15 years old (!). However, it does mention Unicode.
The functionality to distinguish between different types of Unicode characters
is present in the Data.Char
module. Looking at the source:
isPrint c = iswprint (ord c) /= 0
foreign import ccall unsafe "u_iswprint"
iswprint :: Int -> Int
If you trace the commit which introduced iswprint
, you'll land up
here. That commit was made 13 years ago.
Maybe there was sufficient code written in those two years which they didn't
want to break? I don't know. If some GHC developer could shed more light on this,
that'd be awesome :). Daniel Wagner and Paul Johnson in the comments have pointed out a very good reason for this - operating with non-Unicode systems must've been a high priority (~15 years ago) as Unicode was relatively new back then.