How can I use Japanese language characters in variable naming?

Question

I'm trying to use Japanese words in variable naming but C language doesn't appear to be work this way. Is there any way to fix this?

See my code below.

#include <stdio.h>

struct 忍び/* Shinobi */
{
    char 名前/* Namae */[50];
    char 血液型/* Ketsuekigata */[3];
    char 性別/* Seibetsu */[10];
    char 星占い/* Hoshi uranai*/[10];
    int 年齢/* Nenrei */;
};

int main(void)
{
    struct 忍び Uchiha_Itachi;
    Uchiha_Itachi.年齢 = 21;
    printf("the age of itachi > %d", Uchiha_Itachi.年齢);
}

"I'm trying to use Japanese words in variable naming" cause you wan't noone not capable of reading japanese being able to understand your code? — Swordfish, Nov 09 '18 at 11:03
Possible duplicate of [What are the different character sets used for?](https://stackoverflow.com/questions/27872517/what-are-the-different-character-sets-used-for) — Swordfish, Nov 09 '18 at 11:04
@usr2564301 lol, you might want to travel back before things like google translate existed. — Swordfish, Nov 09 '18 at 11:04
I just want to write a program for myself, for fun, you know. Of course, I'm writing public softwares in English. — wheldrake, Nov 09 '18 at 11:05
@Swordfish: I was more thinking of `τ = 2*π;`. There was a time (old) people said "why must we add support for that newfangled key `~` when you can just type in `??-` like everybody else does?". — Jongware, Nov 09 '18 at 11:12
@wheldrake: This is possible in C++. See this question.https://stackoverflow.com/questions/52586368/can-c-variables-in-cpp-file-defined-as-special-symbols-%CE%B2/52586769#52586769 — P.W, Nov 09 '18 at 11:14
Doing this is bad practice for any purpose and _will_ come back to haunt you sooner or later. Keep identifiers and comments in English. — Lundin, Nov 09 '18 at 11:33
@Gerhardh, compiler is GNU GCC Compiler and I didn't get what you mean by saying "which options?". — wheldrake, Nov 09 '18 at 11:35
@wheldrake If you want others to read them, then yes. For example, if you dump a piece of problematic code on SO and expect help, then provide comments in Japanese... — Lundin, Nov 09 '18 at 11:43
From my reading of the standard, some usage of universal characters should be allowed but I need to dig further to fully understand what it means. Sorry. — Gerhardh, Nov 09 '18 at 11:52
@Lundin, I said this is a private project for fun, I don't think anybody will see this code besides me. — wheldrake, Nov 09 '18 at 11:55
@Gerhardh, I think it's C11, I wanted to die while trying to find which standard version is this compiler's using. — wheldrake, Nov 09 '18 at 12:17
I like to use the "philiosophy" ["UTF-8 Everywhere"](https://utf8everywhere.org/) ... except when US-ASCII is enough :) --- C source code (with comments) need no more than ASCII. — pmg, Nov 09 '18 at 12:17
So it wasn't you who just posted a question on SO, but a burglar or your cat walking across the keyboard or something? :) — Lundin, Nov 09 '18 at 12:18
I just posted this to ask a question. Comments are about the pronunciations of the relative variables. Thought it was a simple code and a simple question for a human being to understand. — wheldrake, Nov 09 '18 at 12:24
You might like [C9 5.2.9 Character sets](http://port70.net/~nsz/c/c99/n1256.html#5.2.1). — pmg, Nov 09 '18 at 12:32
all the comments about keeping things "in English" are wrong; this is about the character set supported by the C language for identifiers; [a-z , A-Z , 0-9] are not the sole property of English; you can have variables like `menge` or `quantita` (fuzzing the grave accent on that final 'a'); you can also have phonetic replacements of non-Romanesque words, like `ryo`; I understand the OP's question and I understand that this is not what is being asked for; but I want to clarify, the C standard requires [a-z , A-Z , 0-9], not English — landru27, Nov 09 '18 at 15:03

chux - Reinstate Monica · Accepted Answer · 2018-11-09T17:29:34.790

C language doesn't appear to be work this way. Is there any way to fix this?

Support for such characters is implementation defined. Many compilers will not support this, a few might.

An identifier may contain non-digits (a-z, A-Z, _), digits (0-9), universal-character-name or other implementation-defined characters. C17 6.4.2 1

Alternative

Since C99, code could use universal-character-name via \Unnnnnnnn or \unnnn - a not so pretty possibility.

An application would be to convert source code struct 忍び (that worked on one compiler) to struct \u5fcd\u3073 for other compilers.

https://www.branah.com/unicode-converter
忍び --> \u5fcd\u3073

#include <stdio.h>

struct \u5fcd\u3073/* Shinobi */ {
    char \u540d\u524d /* Namae */[50];
    char \u8840\u6db2\u578b /* Ketsuekigata */[3];
    char \u6027\u5225/* Seibetsu */[10];
    char \u661f\u5360\u3044/* Hoshi uranai*/[10];
    int \u5e74\u9f62/* Nenrei */;
};

int main(void) {
    struct \u5fcd\u3073 Uchiha_Itachi;
    Uchiha_Itachi.\u5e74\u9f62 = 21;
    printf("the age of itachi > %d", Uchiha_Itachi.\u5e74\u9f62);
}

Note: using defines like below, are not specified to work either. Support for such is implementation defined.

// not certain to work
#define 忍び \u5fcd\u3073

If a strong need exists to "any way to fix this?", write your source code as a .wheldrake file and translate it to standard .c one.

Soapbox

One character I would like to use: ≠, the not_equal sign.

@All, IMO, this "universal-character-name or other implementation-defined characters." is a step toward allowing gradual acceptance of non-ASCII source code evolution. Might take 20 years. — chux - Reinstate Monica, Nov 09 '18 at 18:17
@usr2564301 Still it is not clear how this code or answer evokes "Donald Knuth's source code" or "full of arrows and other relational symbols". Skimmed [_The Art of Programming_](http://broiler.astrometry.net/~kilian/The_Art_of_Computer_Programming%20-%20Vol%201.pdf) and failed to find it "full of arrows and other relational symbols" other than `=`. — chux - Reinstate Monica, Nov 09 '18 at 19:13
@usr2564301 I see. Thanks for the The Art of Programming lead. Something to peruse someday. — chux - Reinstate Monica, Nov 09 '18 at 19:17

Blaze · Answer 2 · 2018-11-09T11:15:26.277

2

It's not possible, at least there's no portable way (some compilers may allow it nonetheless, while others won't). According to this:

An Identifier can only have alphanumeric characters(a-z , A-Z , 0-9) and underscore(_).

It might however work using macros. To try that, add this line before you first use "忍び":

#define 忍び Shinobi

However, I wouldn't recommend it. It also isn't portable and some compilers might allow certain symbols while others won't.

edited Nov 09 '18 at 11:15

answered Nov 09 '18 at 11:04

Blaze

16,736
2
25
44

4

Isn't the macro name an identifier as well? – Gerhardh Nov 09 '18 at 11:13
1

Not in a C/C++ sense. It gets replaced before the compiler gets to see it. If that works depends on the capabilities of the preprocessor. – Swordfish Nov 09 '18 at 11:15
1

Somehow true, but In ISO/IEC 9899:2011 (E), A.3 a `control-line` is defined as a number of rules which all have an `identifier` as first token after `#define`. Therefore I would assume that the same rules apply as for other identifiers. – Gerhardh Nov 09 '18 at 11:23
Maybe Annex D might be the answer. Also chapter A.1.3 defines a `identifier-nondigit` where one option is a `universal-character-name` – Gerhardh Nov 09 '18 at 11:23
It gives the following error when I try that: "error: macro names must be identifiers". – wheldrake Nov 09 '18 at 11:27
1

That's right, my IDE even complains with "expected an identifier" when I try to define `3a` as a macro, for instance. While at the same time it doesn't care about a lot of other rules (it didn't even have any qualms with `忍び` as variable name). The reason why I mentioned macros is because some setups allow macro names that they won't allow as variable names. So if OP want to write a program for fun, as he says, it's worth a try. – Blaze Nov 09 '18 at 11:30
@Gerhardh `universal-character-name` means escaped forms like `\U1234`. For `identifier-nondigit`, Japanese characters fall under the classification `other implementation-defined characters` (C17 draft 6.4.2.1). – user694733 Nov 09 '18 at 12:08
@user694733 OK, thanks. If they have to be escaped, I'm not sure how useful this might be... – Gerhardh Nov 09 '18 at 13:21
That macro identifiers are (logically, at least) replaced with other text in translation phase 4 in no way implies that they are not subject to the same constraints as other identifiers. In fact, [the lexical rules of the language](http://port70.net/~nsz/c/c11/n1570.html#6.4) do not distinguish between macro identifiers and other identifiers. – John Bollinger Nov 09 '18 at 15:51

How can I use Japanese language characters in variable naming?

2 Answers2