Every programming language has their own interpretation of \n
and \r
.
Unicode supports multiple characters that can represent a new line.
From the Rust reference:
A whitespace escape is one of the characters U+006E (n), U+0072 (r), or U+0074 (t), denoting the Unicode values U+000A (LF), U+000D (CR) or U+0009 (HT) respectively.
Based on that statement, I'd say a Rust character is a new-line character if it is either \n
or \r
. On Windows it might be the combination of \r
and \n
. I'm not sure though.
What about the following?
- Next line character (U+0085)
- Line separator character (U+2028)
- Paragraph separator character (U+2029)
In my opinion, we are missing something like a char.is_new_line()
.
I looked through the Unicode Character Categories but couldn't find a definition for new-lines.
Do I have to come up with my own definition of what a Unicode new-line character is?