Converting from Unicode (or any other character encoding) to the custom encoding and vice versa - Is it possible?

Question

Because Unicode has the complex encoding of every character, is it possible somehow to made custom encoding, i.e. to make a converter from custom encoding to Unicode and vice versa so the user can easily type Unicode characters on the web? I will try to explain what is my idea.

For example, I will make a webpage which will have a text field and the output div. The user would type custom code. The Javascript or PHP would take input value of the field, compare it to "coding book", convert it to the corresponding predefined Unicode character and display it in the output div. If this is possible, where would be placed to put "coding book" file so the code can compare input values to it and find corresponding Unicode values?

For example: Aa568 (user input, custom code) ---> U+00E7 (converted value) ---> ç (displayed value).

I need this for one project I make, so I want to know is it possible in any way to achieve this.

UPDATE:

My question is connected to my posts about Egyptian hieroglyphic writer. I found a hieroglyphic font which contains 7950 hieroglyphic characters, so I would like to make this font typable by assigning predefined codes (conventional egyptological coding) to every one of these characters in the font, so users can type this conventional code, and get a specific character from the font.

This question seems to have rather little to do with Unicode actually, and is more of the “how can I replace user input X with text Y.” But your question is rather too broad to begin with, resp. “is it possible” questions aren’t very welcome here to begin with. Please go read [ask]. And then, edit your question to tell us which specific parts of this you are having problems with. — CBroe, May 09 '18 at 07:05
Because the users already know this "custom code" so it is easier for them to use it, then Unicode escapes. — Boris J., May 09 '18 at 13:57

score 0 · Answer 1 · edited Jun 20 '20 at 09:12

0

First, some information to clarify things, maybe this will help you already.

Unicode

Unicode consists of different code points, where each code point represents a different character. As you stated correctly in your example, the code point U+00E7 represents the character ç. According to Wikipedia, there are 1.114.112 code points divided into 17 planes consisting of 65536 characters (one font can only store 65,535 different glyphs, so you know how the number inside the planes comes to halt). These code points are only a theoretical concept, the characters are not stored like this in memory!

Encoding

Now Unicode is NOT an encoding In the sense how a character in coded on your pc. Usually, the above-mentioned codepoint is encoded in different representations. The two most widely used are UTF-8 and UTF-16. The first, UTF-8, uses a single byte to store the characters in memory. ASCII characters are stored in a single byte, everything above in multibyte, for example, ç becomes C3 A7 in your memory. When you use UTF-16, two bytes will be used for the encoding, so ç becomes 00e7 in memory. This how your PC will always see the characters, never as actual codepoints (unless there is an encoding I am not aware of) These encodings can than be converted to code point to find the correct Unicode character.

As you see, something similar to your project exists and is used worldwide, so it is definitely possible. You should ask your question if you really want to use your custom encoding or use one of the widely used standard encodings (there are more than UTF-8 and UTF-16). When you come up with a function, that converts your custom encoding to a Unicode code point using a "code book" or maybe a rule you come up with, nothing stands in your way. How you can achieve this is explained here to some extent: https://linux.die.net/man/7/utf8.

edited Jun 20 '20 at 09:12

Community

1
1

answered May 09 '18 at 06:46

J.Panek

425
5
16

Thank you for your answer. Some things are clearer now. I wrote an update above, so you can find out what problem do I really have. – Boris J. May 09 '18 at 14:21
can you give a link to your other post and the font you are using? Do Unicode code points already exist for your Egyptian characters? – J.Panek May 09 '18 at 14:45
Of course. My other posts: https://stackoverflow.com/questions/50200146/user-input-showing-images-how-to-solve-some-of-script-problems https://stackoverflow.com/questions/50238810/javascript-showing-dummy-image-for-broken-links-does-not-functioning-and-more I tried to use images of hieroglyphs, but I got some problems with it. – Boris J. May 09 '18 at 16:14
Egyptian hieroglyphs are from 1997. in Unicode, and through years they are expanded. Here is the hieroglyphic font which supports all available hieroglyphs: http://users.teilar.gr/~g1951d/AbydosFonts.zip And here is a documentation for it (check references about unicode implementation): http://users.teilar.gr/~g1951d/AbydosDocs.zip – Boris J. May 09 '18 at 16:16
Do you already have a function, that converts a code point to the corresponding glyph – J.Panek May 10 '18 at 11:30
Unfortunately, I don't. I don't how to approach this problem programmatically. I have an idea, but for now I didn't find the right approach. I know that I need to have a script which would compare user input with predefined codes, find the matching one, convert it, and display it. I also think that I need external file with 7950 codes for script to compare with, but I still searching for solution. – Boris J. May 10 '18 at 16:35
I cannot help you on the javascript/php part. However, I might have realized a part of your idea in C project of mine. First, you will have to find a way to display the glyph (or character, if you prefer to call it this way) from the Unicode code point. I did it with a bitmap font on an embedded device. This might help you https://ecmanaut.blogspot.de/2006/07/encoding-decoding-utf8-in-javascript.html – J.Panek May 10 '18 at 19:33
Thank you very much for this link. I will try to use it. I got an idea to use Javascript object as a one part of my conversion script. It gives me option to have key and value, so I can compare my input to key and if the match is found, to display its value which would be unicode, or even hieroglyph. – Boris J. May 11 '18 at 13:29
If solve your problem, an update would be appreciated. And if my answer helped you solve it I would be happy about an acceptance – J.Panek May 14 '18 at 07:36

Converting from Unicode (or any other character encoding) to the custom encoding and vice versa - Is it possible?

1 Answers1

Unicode

Encoding