2

For some reason Redcarpet markdown renders ' as ' while rendering as . Are there two types of single quote? Why would Redcarpet treat one differently than the other. (Ascii table seems to have one, but under unicode I'm assuming there are more?)

Searching for ' and is a bit difficult as well as chrome's find (command + f) and Google search seems to treat the two characters as one and the same.

enter image description here

enter image description here

Kedar Mhaswade
  • 4,535
  • 2
  • 25
  • 34
Reza Shirazian
  • 2,303
  • 1
  • 22
  • 30
  • I'm having some trouble understanding the question... you see that they are two different characters, otherwise you wouldn't be showing two different glyphs (one straight, one more hooked, at least as they show on my screen)... so of course that means they have different character codes. And if one of those has special meaning and the other doesn't, then the one gets escaped and the other doesn't. Am I missing something? – Mark Adelsberger Jan 30 '17 at 22:01
  • no you're not missing anything. What I want to know is what's the difference between the two characters, why Redcarpet escapes one and renders the other. Is there a better solution than `@markdown.render(a.content.gsub("'", "’"))` – Reza Shirazian Jan 30 '17 at 22:19
  • Markdown was conceived as a simplified way of writing text documents compared to HTML. The canonical rendering is HTML. ASCII doesn't have anything to do with HTML; all characters are Unicode, regardless of the document encoding and regardless of being written as numeric character entity references or not. Tip: To research a character, put it in the U+hhhhh format, where hhhhh is the 4 to 5 hex digits for the codepoint. – Tom Blodget Jan 31 '17 at 01:30

2 Answers2

8

Yes, there are. These two quote characters are:

hex(decimal) codepoint = 2019(8217) and character = ’
hex(decimal) codepoint = 27(39) and character = '

The code-points (first number is hex and the second is decimal values of the code-point) are distinct.

According to the Unicode standard, the first one is:

2019;RIGHT SINGLE QUOTATION MARK;Pf;0;ON;;;;;N;SINGLE COMMA QUOTATION MARK;;;;

whereas the second one is

0027;APOSTROPHE;Po;0;ON;;;;;N;APOSTROPHE-QUOTE;;;;

Perhaps RedCarpet should be using proper HTML entity escaping for the first type of quote. (This page says it should be escaped as ’)

You are right when you say that the second quote: ' is part of 7-bit ASCII encoding.

Even if the first quote, ’ which renders as: ’ is rather indistinguishable to human eye from the second quote: ', you can search for it on Chrome or any other editor/browser using your operating system's Input Method. This is because entering a character is the job of so called Input Method and you can enter any character in the given operating system if you know the input methods it supports. For example, on the Mac:

  • Use a U+ Keyboard that looks like below on the menu bar.
  • Press Cmd + F on Chrome to start searching.
  • Keep the Alt Key pressed and enter the unicode hex value of the quote you are looking for (2719). What will appear in the search box is ’ (In fact this is what I did to print that quote!)

Similar facility is available on Linux and Microsoft Windows.

enter image description here

Kedar Mhaswade
  • 4,535
  • 2
  • 25
  • 34
  • 3
    Well noticed. It should be observed that single Quotes and Apostrophes have different grammatical usages – rafaelc Jan 31 '17 at 13:22
0
//Here is the C# way to handle in programme
// example for removing apostrophe from O’Reilly
char charToRemove = '\''; // works in android and windows
char charToRemove1 = (char)8217; // works in iOS/macOS
string myName = "O’Reilly";

myName = myName.Replace(charsToRemove.ToString(), "");  // Android windows
myName = myName.Replace(charToRemove1.ToString(), "");  // ios


Console.Writeline(myName);