-2

In How to stringify a string which contains a comma? it is described how to stringify a string.

However, it does not work with specials characters, for instance:

#include <Arduino.h>
#define TOSTR_(...) #__VA_ARGS__
#define STRINGIFY(...) TOSTR_(__VA_ARGS__)

const char htmlRootPage[] PROGMEM =
STRINGIFY(<input name="txtGt" type="number" value="39.5" max="42" step="0.5" style="width:160px;">°C<br>)
;

void setup() {
}

void loop() {
}

Here the degree char ° is not valid, I get

6:99: error: extended character ° is not valid in an identifier
    6 | STRINGIFY(<input name="txtGt" type="number" value="39.5" max="42" step="0.5" style="width:160px;">°C<br>)
      |                                                                                                   ^
exit status 1
extended character ° is not valid in an identifier

Just to try, when I replace the degree char, with:

ρΨψλω àäâéèêëïîöôóíùüû ES_áñ DE_ß HU_őű NOK_åæø CZK_úůýžáčďéěíňóřšť PL(check accent)_ąćęłńśźż RO_ăâîşșţ RU_ёяшертыуиопющэъжьлкйчгфдсазхцвбнм

it does compile.

However, other any other chars such as

¿¡«»

I do get the same compiler error: extended character is not valid in an identifier.

I may be wrong, but it seems to me it ought to accept any utf-8 character(the above example with European non english characters shows it does) until it match the end parenthesis of the STRINGIFY preprocessor statement, but oddly some chars seems to cause issue.

The code is build on ArduinoIDE 1.8.19 https://arduino.github.io/arduino-cli/0.32/sketch-build-process/

Compiler is following:

$ avr-gcc -v
Using built-in specs.
Reading specs from /usr/lib/gcc/avr/12.2.0/device-specs/specs-avr2
COLLECT_GCC=avr-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/avr/12.2.0/lto-wrapper
Target: avr
Configured with: /build/avr-gcc/src/gcc-12.2.0/configure --disable-install-libiberty --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-linker-build-id --disable-nls --disable-werror --disable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gold --enable-languages=c,c++ --enable-ld=default --enable-lto --enable-plugin --enable-shared --infodir=/usr/share/info --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --prefix=/usr --target=avr --with-as=/usr/bin/avr-as --with-gnu-as --with-gnu-ld --with-ld=/usr/bin/avr-ld --with-plugin-ld=ld.gold --with-system-zlib --with-isl --enable-gnu-indirect-function
Thread model: single
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (GCC)

Why do you need the macro at all? Can't you simply provide an actual string?

Macro, because it's to serve html page to the server on IoT, most of the html,css,js code is shared, but use of #if #else #end according to the actual physical hardware of IoT devices & sensors.

IoT device are limited particularly with ram(in Kbytes not as PC with Gbytes), I stringify a all html page, I just noticed that oddly some chars are not passing.

So the binary code is then uploaded to an IoT device, not a PC, for execution.

Thanks

user2718593
  • 111
  • 8
  • 4
    Works fine for me. Please post the _full_ code and compiler and compiler version and compiler options you are using. – KamilCuk May 23 '23 at 21:41
  • 1
    What file encoding is your editor using? What encoding is your compiler expecting? – tadman May 23 '23 at 21:43
  • When dealing with characters like this, it is a bad idea to rely on the encoding used by the _source editor_ since that may not be the same used by the executable. Use hex escape sequences instead `"\x42"` where 42 is the number in hex for the symbol you want. – Lundin May 24 '23 at 06:53
  • 1
    Why do you need the macro at all? Can't you simply provide an actual string? – the busybee May 25 '23 at 06:02
  • Your explanation doesn't really answer thebusybee's question: Why cannot you simply write `const char htmlRootPage[] PROGMEM = " – user694733 May 25 '23 at 13:42
  • > No need to stringify anything if it's already a string. It's not a string is html coding, have ever look at an html page source code, I can't simply add begin and end quote to an html source page, itseft contains single, double quotes, semi colon for Javascript or css ...I'm just illustring the cause of the problem, to make it reproducable. – user2718593 May 25 '23 at 13:47
  • But what do you gain in using the macro? I don't see any between `STRINGIFY()` and `""`. You might want to extend your [mre] to explain the issue. You can embed double quotes with the backslash. – the busybee May 25 '23 at 13:58
  • @the busybee, it seems you never programmed a html site. Take for instance a wikipedia source page and amuse yourself in adding backslash and quotes to pass it to C string... – user2718593 May 25 '23 at 14:06
  • No problem, I'd copy'n'paste into a pair of double quotes, using a serious IDE. Et voilà. -- Anyway, this involves manual work, which is error-prone. In similar cases I use objcopy to produce an object file with the content of the HTML file as an array of characters. -- And I'm sure there are more alternatives. It seems, we have a classical XY problem here. -- Anyway, since the parser tries to understand what you have in the pair of parentheses, you simply cannot pass such characters. Are you open to alternative solutions? – the busybee May 25 '23 at 20:17
  • Oh, by accident (a late acceptance of an answer of mine) I just looked at [this question](https://arduino.stackexchange.com/q/90298/60431). Neither its issue nor my answer is relevant, but the way the OP inserted the HTML content. You might find this interesting. I did not know of `R"delimiter(...)delimiter"` for raw string literals. – the busybee May 25 '23 at 20:24
  • @the busybee >Are you open to alternative solutions? SURE, esp. as I haven't found good/working one so far. – user2718593 May 25 '23 at 20:29
  • Did you follow the link in my second comment? Did you find the raw string literal? Did you try to apply that to your application? (These are all more or less rhetorical questions, no need to react. To get your question re-opened, [edit] it and add all clarifications that are in the comments currently, then hope the best.) – the busybee May 25 '23 at 20:34
  • >R"delimiter(...)delimiter" I do know, but I try concat const string which are constructed through #if #else clause, and that's the issue, can't put the prama into "delimiter(...)delimiter" as it is as expected treated as string and not preprocessor instruction, that's the very reason of trying stringifying but seems to cause problem(bug?) with some chars. See https://forum.arduino.cc/t/progmem-how-to-add-compile-if-within-progmem-pragma-compiler-if/1129390/4 – user2718593 May 25 '23 at 20:36
  • So _please_ [edit] you question clarifying that your real issue is concatenation of raw string literals depending on preprocessor conditionals. As I assumed, an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). -- Oh, BTW, isn't the professional way to have non-ASCII characters in HTML by HTML entities? So the correct(TM) way to place a degree character is `°`. – the busybee May 26 '23 at 05:45
  • >So please edit you question clarifying that your real issue is concatenation of raw string literals depending on preprocessor conditionals. No the issue remains, try to stringify https://en.wikipedia.org/wiki/Temperature >So the correct(TM) way to place a degree character is ° only if you do want to write the all internet in non sense ascii, solution . stringify ought to work for any string from begin to end, preprocessor does not interpret the string. Thanks for suggestion – user2718593 May 26 '23 at 07:40
  • This is your misconception: "_stringify ought to work for any string from begin to end, preprocessor does not interpret the string._" The preprocessor works on _preprocessor tokens_, and you cannot pass any deliberate character sequence as argument to a macro. In consequence, your approach to use `STRINGIFY()` is doomed to fail. – the busybee May 26 '23 at 10:30

2 Answers2

1

How to stringify a string which contains extended characters, such as degree character ° -

Without double expansion.

#define STRINGIFY(...) #__VA_ARGS__
const char htmlRootPage[] = STRINGIFY(<input name="txtGt" type="number" value="39.5" max="42" step="0.5" style="width:160px;">°C<br>);

Or replace character by a macro, delaying the expansion.

#define TOSTR_(...) #__VA_ARGS__
#define STRINGIFY(...) TOSTR_(__VA_ARGS__)
#define DOT °
const char htmlRootPage[] = STRINGIFY(<input name="txtGt" type="number" value="39.5" max="42" step="0.5" style="width:160px;">DOT C<br>);

Macro, because it's to serve html page to the server on IoT,

So do not use double expansion. However, if there are macros in the html page to be expanded, then there can't be invalid identifiers.

However, I do not get the reasoning. Consider using an actual string instead of a macro.

You are using full C++. Use a raw string literal.

I stringify a all html page

If you want to convert a file to a string, use a program to generate C source code. For exactly this purpose, xxd has been used for decades. (Newer work has been done with #embed keyword, but I do not know the status of it https://thephd.dev/finally-embed-in-c23 ).

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
0

error: extended character ° is not valid in an identifier

The identifiers in C are formed by a starting character (that must be an alphabetic character, or the _ char), followed by more alphabetic and/or decimal digit characters. This includes all accented alphabetic characters in the extended character set (as you show in your question), but it seems that the degree sign character is not in the set of the so named alphabetic. It most probably is considered a punctuation character, and so, it is not allowed in an identifier.

If you want your code to be readable, and portable, I suggest you to tie yourself to the ASCII character set. There's no absolute need to name your identifers using national characters, makes your code difficult to read (mostly by all, but your locale developers) and will not be handled correctly by all compiler installations worlwide.

One of the issues that you will have, if you finally insist on using national character extensions is that your source files will have to go everywhere accompanied of a label that tells anybody which encoding are your source files written in. If you edit your sources with an editor that changes the character set to another, you can end making your sources uncompilable (e.g. if you use utf-8 encoding but you get your sources converted to iso-8859-1 in the course of an edition)

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31