Converting from letters to their relative positions in the alphabet (c#)

Question

A while ago I found this code and I want to understand how it works)

int index = (int)c % 32 +1;

I have used this line successfully to convert letters into numbers - for example a becomes 1 (and so does A) Could someone please explain how this happens (I've looked a bit into base 32 but am no wiser)?Also would there be an easy way to convert the integer back into a letter?

That `+ 1` doesn’t actually make sense there. With it, `'a'` and `'A'` will give you 2 (instead of 1). The idea was maybe to make it `- 1` to get an index starting at zero instead. — poke, Nov 09 '14 at 14:10
Be careful about your terms. You seem to be focusing on the English alphabet. There are "letters" used in English writing that aren't in the English alphabet and, of course, there are other alphabets. Usually, we leave the term "alphabet" for linguists and [language academies](http://en.wikipedia.org/wiki/List_of_language_regulators). Unicode attempts to provide complete writing systems, consisting of characters of various categories. Some characters, it classifies as "letters"—93,455 letters out of 1,112,064 "characters." [LINQPad Instant Share](http://share.linqpad.net/ngvi3p.linq) — Tom Blodget, Nov 09 '14 at 21:13

score 6 · Answer 1 · answered Nov 09 '14 at 14:01

6

All letters has integer code. For example 'a' has code 97. So (int)'a' is 97. So (int)'a' % 32 is 1. Because there are less than 32 english letters, everything converts correctly. Also by happy coincidence difference between upper and lower letter is 32 (for example 'a' - 'A' == 32). And so that works for upper letters.

To convert an integer back to letter you also could use integer codes. For example 'A' + index - 1 will give you upper letter with number index in alphabet. Also 'a' + index - 1 will give lower letter with the same number.

Look at ASCII for example to see integer codes of symbols.

answered Nov 09 '14 at 14:01

justanothercoder

1,830
1
16
27

The difference of 32 between `a` and `A` isn’t the only (necessary) conincidence. More important is that 65 and 97 are both 1 modulo 32. – poke Nov 09 '14 at 16:15
Yes, I meant that they are both 1 modulo 32 because difference between them is 32. – justanothercoder Nov 09 '14 at 16:49
Is this really just coincidence? I know that early computer programmers were exceedingly forward thinking in some respects, I can't help but wonder if they did this on purpose too. – C Bauer Nov 09 '14 at 17:01
One possible purpose to do this can be connected with bit operations. 32 is 2^5, so to add or subtract 32 you can set or unset bit. That is very fast operation. – justanothercoder Nov 09 '14 at 17:07

score 4 · Accepted Answer · answered Nov 09 '14 at 14:09

This is all due to how characters (and strings) are actually represented. Every character is encoded using code points, which are just numbers. Many code points make up a code page which is essentially a table that maps a number to an actual character.

Ignoring the large code pages that come with Unicode, you can just take a look at ASCII for now, which is the encoding for the first 128 code points. There, you can see that the standard upper case alphabet starts at the number 65, while the lower case alphabet starts at the number 97.

So in your formula, if we assume that c is always a character from the alphabet, we know that its numerical value is between 65 and 90, or between 97 and 122. So taking the character 'A' or 'a', we have a value of 65 or 97 respectively.

All that’s left is the coincidence that the upper case and lower case alphabet start at a difference of 32, and that 65 modulo 32 is 1. This makes (int)c % 32 give you the index of the character in the alphabet starting at 1.

pmcoltrane · Answer 3 · 2014-11-10T01:03:16.507

2

This works because the default character encoding in .NET is UTF-16. For alphabetic letters, this will match up with ASCII.

If you look at an ASCII table, you'll see that uppercase A-Z are encoded starting at position 65, and lowercase a-z are encoded starting at position 97. The formula you provide will return 1 for uppercase or lowercase 'A', 2 for uppercase or lowercase 'B', etc.

From that table, you should also be able to convert an integer between 1 and 26 into a character of a case of your choosing: add 64 to the integer for uppercase, or 96 for lowercase.

edited Nov 10 '14 at 01:03

answered Nov 09 '14 at 14:03

pmcoltrane

3,052
1
24
30

A .NET String is a counted sequence of UTF-16 code-units. UTF-8 is the default encoding for streams and files. – Tom Blodget Nov 09 '14 at 20:45

Converting from letters to their relative positions in the alphabet (c#)

3 Answers3