2

I'm trying to use Japanese words in variable naming but C language doesn't appear to be work this way. Is there any way to fix this?

See my code below.

#include <stdio.h>

struct 忍び/* Shinobi */
{
    char 名前/* Namae */[50];
    char 血液型/* Ketsuekigata */[3];
    char 性別/* Seibetsu */[10];
    char 星占い/* Hoshi uranai*/[10];
    int 年齢/* Nenrei */;
};

int main(void)
{
    struct 忍び Uchiha_Itachi;
    Uchiha_Itachi.年齢 = 21;
    printf("the age of itachi > %d", Uchiha_Itachi.年齢);
}
wheldrake
  • 81
  • 7
  • 1
    "I'm trying to use Japanese words in variable naming" cause you wan't noone not capable of reading japanese being able to understand your code? – Swordfish Nov 09 '18 at 11:03
  • @Swordfish: that's indeed a valid use. – Jongware Nov 09 '18 at 11:04
  • Possible duplicate of [What are the different character sets used for?](https://stackoverflow.com/questions/27872517/what-are-the-different-character-sets-used-for) – Swordfish Nov 09 '18 at 11:04
  • @usr2564301 lol, you might want to travel back before things like google translate existed. – Swordfish Nov 09 '18 at 11:04
  • I just want to write a program for myself, for fun, you know. Of course, I'm writing public softwares in English. – wheldrake Nov 09 '18 at 11:05
  • 1
    @Swordfish: I was more thinking of `τ = 2*π;`. There was a time (old) people said "why must we add support for that newfangled key `~` when you can just type in `??-` like everybody else does?". – Jongware Nov 09 '18 at 11:12
  • @wheldrake: This is possible in C++. See this question.https://stackoverflow.com/questions/52586368/can-c-variables-in-cpp-file-defined-as-special-symbols-%CE%B2/52586769#52586769 – P.W Nov 09 '18 at 11:14
  • @usr2564301 You really want to start a τ over π fight? ;) – Swordfish Nov 09 '18 at 11:14
  • Which compiler, which options? – Gerhardh Nov 09 '18 at 11:30
  • Doing this is bad practice for any purpose and _will_ come back to haunt you sooner or later. Keep identifiers and comments in English. – Lundin Nov 09 '18 at 11:33
  • @Gerhardh, compiler is GNU GCC Compiler and I didn't get what you mean by saying "which options?". – wheldrake Nov 09 '18 at 11:35
  • 1
    @Lundin, wait... Are comments have to be in English too? – wheldrake Nov 09 '18 at 11:36
  • Did you select any C standard version? C99, C11 etc.? – Gerhardh Nov 09 '18 at 11:43
  • @wheldrake If you want others to read them, then yes. For example, if you dump a piece of problematic code on SO and expect help, then provide comments in Japanese... – Lundin Nov 09 '18 at 11:43
  • From my reading of the standard, some usage of universal characters should be allowed but I need to dig further to fully understand what it means. Sorry. – Gerhardh Nov 09 '18 at 11:52
  • @Lundin, I said this is a private project for fun, I don't think anybody will see this code besides me. – wheldrake Nov 09 '18 at 11:55
  • @Gerhardh, I think it's C11, I wanted to die while trying to find which standard version is this compiler's using. – wheldrake Nov 09 '18 at 12:17
  • I like to use the "philiosophy" ["UTF-8 Everywhere"](https://utf8everywhere.org/) ... except when US-ASCII is enough :) --- C source code (with comments) need no more than ASCII. – pmg Nov 09 '18 at 12:17
  • So it wasn't you who just posted a question on SO, but a burglar or your cat walking across the keyboard or something? :) – Lundin Nov 09 '18 at 12:18
  • 3
    I just posted this to ask a question. Comments are about the pronunciations of the relative variables. Thought it was a simple code and a simple question for a human being to understand. – wheldrake Nov 09 '18 at 12:24
  • You might like [C9 5.2.9 Character sets](http://port70.net/~nsz/c/c99/n1256.html#5.2.1). – pmg Nov 09 '18 at 12:32
  • 3
    all the comments about keeping things "in English" are wrong; this is about the character set supported by the C language for identifiers; [a-z , A-Z , 0-9] are not the sole property of English; you can have variables like `menge` or `quantita` (fuzzing the grave accent on that final 'a'); you can also have phonetic replacements of non-Romanesque words, like `ryo`; I understand the OP's question and I understand that this is not what is being asked for; but I want to clarify, the C standard requires [a-z , A-Z , 0-9], not English – landru27 Nov 09 '18 at 15:03

2 Answers2

4

C language doesn't appear to be work this way. Is there any way to fix this?

Support for such characters is implementation defined. Many compilers will not support this, a few might.

An identifier may contain non-digits (a-z, A-Z, _), digits (0-9), universal-character-name or other implementation-defined characters. C17 6.4.2 1


Alternative

Since C99, code could use universal-character-name via \Unnnnnnnn or \unnnn - a not so pretty possibility.

An application would be to convert source code struct 忍び (that worked on one compiler) to struct \u5fcd\u3073 for other compilers.

https://www.branah.com/unicode-converter
忍び --> \u5fcd\u3073

#include <stdio.h>

struct \u5fcd\u3073/* Shinobi */ {
    char \u540d\u524d /* Namae */[50];
    char \u8840\u6db2\u578b /* Ketsuekigata */[3];
    char \u6027\u5225/* Seibetsu */[10];
    char \u661f\u5360\u3044/* Hoshi uranai*/[10];
    int \u5e74\u9f62/* Nenrei */;
};

int main(void) {
    struct \u5fcd\u3073 Uchiha_Itachi;
    Uchiha_Itachi.\u5e74\u9f62 = 21;
    printf("the age of itachi > %d", Uchiha_Itachi.\u5e74\u9f62);
}

Note: using defines like below, are not specified to work either. Support for such is implementation defined.

// not certain to work
#define 忍び \u5fcd\u3073 

If a strong need exists to "any way to fix this?", write your source code as a .wheldrake file and translate it to standard .c one.


Soapbox

One character I would like to use: , the not_equal sign.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • @All, IMO, this "universal-character-name or other implementation-defined characters." is a step toward allowing gradual acceptance of non-ASCII source code evolution. Might take 20 years. – chux - Reinstate Monica Nov 09 '18 at 18:17
  • @usr2564301 Still it is not clear how this code or answer evokes "Donald Knuth's source code" or "full of arrows and other relational symbols". Skimmed [_The Art of Programming_](http://broiler.astrometry.net/~kilian/The_Art_of_Computer_Programming%20-%20Vol%201.pdf) and failed to find it "full of arrows and other relational symbols" other than `=`. – chux - Reinstate Monica Nov 09 '18 at 19:13
  • Oh well. Just wanted to make a case for your `!=`. – Jongware Nov 09 '18 at 19:15
  • 1
    @usr2564301 I see. Thanks for the The Art of Programming lead. Something to peruse someday. – chux - Reinstate Monica Nov 09 '18 at 19:17
2

It's not possible, at least there's no portable way (some compilers may allow it nonetheless, while others won't). According to this:

An Identifier can only have alphanumeric characters(a-z , A-Z , 0-9) and underscore(_).

It might however work using macros. To try that, add this line before you first use "忍び":

#define 忍び Shinobi

However, I wouldn't recommend it. It also isn't portable and some compilers might allow certain symbols while others won't.

Blaze
  • 16,736
  • 2
  • 25
  • 44
  • 4
    Isn't the macro name an identifier as well? – Gerhardh Nov 09 '18 at 11:13
  • 1
    Not in a C/C++ sense. It gets replaced before the compiler gets to see it. If that works depends on the capabilities of the preprocessor. – Swordfish Nov 09 '18 at 11:15
  • 1
    Somehow true, but In ISO/IEC 9899:2011 (E), A.3 a `control-line` is defined as a number of rules which all have an `identifier` as first token after `#define`. Therefore I would assume that the same rules apply as for other identifiers. – Gerhardh Nov 09 '18 at 11:23
  • Maybe Annex D might be the answer. Also chapter A.1.3 defines a `identifier-nondigit` where one option is a `universal-character-name` – Gerhardh Nov 09 '18 at 11:23
  • It gives the following error when I try that: "error: macro names must be identifiers". – wheldrake Nov 09 '18 at 11:27
  • 1
    That's right, my IDE even complains with "expected an identifier" when I try to define `3a` as a macro, for instance. While at the same time it doesn't care about a lot of other rules (it didn't even have any qualms with `忍び` as variable name). The reason why I mentioned macros is because some setups allow macro names that they won't allow as variable names. So if OP want to write a program for fun, as he says, it's worth a try. – Blaze Nov 09 '18 at 11:30
  • @Gerhardh `universal-character-name` means escaped forms like `\U1234`. For `identifier-nondigit`, Japanese characters fall under the classification `other implementation-defined characters` (C17 draft 6.4.2.1). – user694733 Nov 09 '18 at 12:08
  • @user694733 OK, thanks. If they have to be escaped, I'm not sure how useful this might be... – Gerhardh Nov 09 '18 at 13:21
  • That macro identifiers are (logically, at least) replaced with other text in translation phase 4 in no way implies that they are not subject to the same constraints as other identifiers. In fact, [the lexical rules of the language](http://port70.net/~nsz/c/c11/n1570.html#6.4) do not distinguish between macro identifiers and other identifiers. – John Bollinger Nov 09 '18 at 15:51