1

I have the following code on Linux:-

rc = iconv_open("WCHAR_T", SourceCode);

prior to using iconv to convert the data into a wide character string (wchar_t).

I am now compiling this on z/OS. I do not know what value to use in place of "WCHAR_T". I have found that codepages are represented by 5-digit character strings on z/OS, e.g Codepage 500 would be "00500", so I am happy enough with what to put into my SourceCode variable above, I just can't find a value that will successfully work as the first parameter to iconv_open.

wchar_t are 4 bytes long on z/OS (when compiling 64-bit as I am), so I assume that I would need some varient of an EBCDIC equivalent to UTF32 or UCS4 perhaps, but I cannot find something that works. Every combination I have tried to date has returned with an errno of 121 (EINVAL: The parameter is incorrect).

If anyone familiar with how the above code works on Linux, could give a summary of what it does, that might also help. What does it mean to iconv into "WCHAR_T"? Is this a combination perhaps, of some data conversion and additionally a type change to wchar_t?

Alternatively, can anyone answer the question, "What is the internal representation of wchar_t on z/OS?"

Morag Hughson
  • 7,255
  • 15
  • 44
  • I'm told there isn't enough information in this to answer. Are you planning on processing EBCDIC data with this? – Kevin McKenzie May 19 '20 at 11:48
  • I am planning on processing data that could be in any code page, EBCDIC or ASCII. I give one example, but the incoming data could be in any codepage. – Morag Hughson May 20 '20 at 09:42
  • I suppose the other way of asking my question is, "What is the internal representation of wchar_t on z/OS?" – Morag Hughson May 22 '20 at 23:47

1 Answers1

2

wchar_t is an implementation defined data type. On z/OS it is 2 bytes in 31-bit mode and 4 bytes in 64-bit mode.

There is no single representation of wchar_t on z/OS. The encoding associated with the wchar_t data is dependent on the locale in which the application is running. It could be an IBM-939 Japanese DBCS code page or any of the other DBCS code pages that are used in countries like China, Korea, etc.

Wide string literals and character constants i.e. those defined as L"abc" or L'x' are converted to the implementation defined encoding used to implement wchar_t data type. This encoding is locale sensitive and can be manipulated using wide character run time library functions.

The conversion of multi byte string literals to wide string literals is typically done by calling one of the mbtowc run time library functions which respect the encoding associated with the locale in which the application is running.

iconv on the other hand can be used to convert any string literals to any one of the supported destination code pages including double byte code pages or any of the Unicode formats (UTF8, UTF16, UTF32). The operation of iconv is independent of wchar_t type.

Universal coded character set converters may be the answer to your question.

The closest to Unicode on z/OS would be UTF-EBCDIC but it requires defining locales that are based on UTF-EBCDIC.

If running as an ASCII application is an option, you could use UTF-32 as the internal encoding and provide iconv converters to/from any of the EBCDIC code pages your application needs to support. This would be better served by char32_t data type to avoid opacity of wchar_t.

Morag Hughson
  • 7,255
  • 15
  • 44
Milos Lalovic
  • 554
  • 3
  • 10
  • Thank you for your answer - however it does not actually address what I should put in the parameter on `iconv_open` instead of "WCHAR_T"? – Morag Hughson May 20 '20 at 09:43
  • Hi there, thank you for the update. Are you perhaps trying to tell me that there isn't an equivalent to "WCHAR_T" as a parameter to `iconv_open` when on z/OS? – Morag Hughson May 21 '20 at 11:42
  • If "WCHAR_T" means UTF32 you will have to generate your own iconv converter using the uconvdef tool. An alternative would be to use the existing iconv converter for UFF-8 and do algorithmic conversion to UTF32 from UTF-8. – Milos Lalovic May 21 '20 at 13:16
  • I don't know what "WCHAR_T" means - that is what my question is asking. – Morag Hughson May 21 '20 at 23:23
  • @MoragHughson The key point is in the first sentence: what `wchar_t` means is implementation- and locale-dependent, so we can't say (at least without exact information about compiler and locale-settings used)... – piet.t May 22 '20 at 06:37
  • I understand that `whar_t` is implementation dependant, but I'm not asking about the type, I'm asking what to use instead of `"WCHAR_T"` as the second parameter in `iconv_open`. Is the fact that there are no answers yet to that suggesting that there is no equivalent? – Morag Hughson May 22 '20 at 23:32
  • I suppose the other way of asking my question is, "What is the internal representation of wchar_t on z/OS?" – Morag Hughson May 22 '20 at 23:48
  • Hi - I notice you have edited your answer again (SO seems to leave discovering that fact as a random thing - no notification of such!). I see the statement "There is no single representation of wchar_t on z/OS". Do you happen to know whether that is also a true statement on Linux? It would really help me to understand that. – Morag Hughson May 26 '20 at 03:06
  • I am not a Linux developer so I can not say anything from personal experience, but Linux is an ASCII based platform and UTF32 is a perfect choice for wchar_t. – Milos Lalovic May 26 '20 at 13:06
  • I have edited your answer to put the pertinent answer nearer the beginning and in bold, and am going to award you the bounty. However, I don't feel like I am any closer to understanding what I need to do now wrt the question's line of code. Perhaps I don't understand my question well enough either! :-) – Morag Hughson May 26 '20 at 22:50
  • P.S. in case you're interested, I have opened a new question (https://stackoverflow.com/questions/62032729/using-iconv-with-wchar-t-on-linux) to see if I can attack this from the opposite direction. – Morag Hughson May 26 '20 at 22:57