The meaning of nuc_length and uc_length parameters in PyRFC?

Question

My favorite search engine (ecosia) could not find the canonical docs from upstream (SAP) about the meaning of uc_length vs nuc_length.

What is the difference between both?

I get these parameters with a modified version of clientPrintDescription.py

Sandra Rossi · Accepted Answer · 2019-10-02T18:11:25.817

2

I can't be sure what they do in the Python script, but based on my ABAP/SAP knowledge, I can easily say that:

nuc_length: length of the parameter in a non-Unicode ABAP-based system, in number of bytes
uc_length: length of the parameter in a Unicode ABAP-based system, in number of bytes

A non-Unicode ABAP-based system uses one byte to encode each character, while a Unicode ABAP-based system uses two bytes to encode each character. From ABAP 7.50, all systems are Unicode.

In Unicode ABAP-based system, strings of characters and text fields can store Unicode characters from U+0000 to U+FFFF. Note that characters U+D800 to U+DFFF are considered like actual characters by the ABAP runtime environment (dixit ABAP documentation: "The ABAP programming language supports a subset covered by UCS-2 and not the full UTF-16 set.")

Note that structured parameters are made of several fields which can mix characters and non-characters, the "uc_length" will double the number of bytes only of the character fields. There are also some dummy bytes between fields because of "alignment".

In your example, a text field of 80 bytes in a Unicode system corresponds to 40 characters.

edited Oct 02 '19 at 18:11

answered Oct 02 '19 at 13:06

Sandra Rossi

11,934
5
22
48

I understand, but it is not straight forward. My SAP server is unicode based. I look at the corresponding uc_length and read 80. But modern languages don't use bytes for strings, they use some highlevel unicode thing. As a developer I want to know: How many characters can I put into this field? Of course I can do the math `80 / 2 ===> 40`. I know what utf16 is, but ... Why not tell the developer the number of unicode characters he can transfer? I know you are not the inventor of this. Thank you very much for your answer. I just had to write what was on my mind. – guettli Oct 02 '19 at 13:28
1

@guettli: it's not that easy. The length in bytes is the only thing which can be returned for sure if being technically correct. How many characters fit in there depends on which code page is used. If you know UTF-16, then a character always needs at least 2 bytes, but can also require a multiple of 2 bytes. The same is with non-unicode lengths. While you usually only need 1 byte for a standard US-ASCII character, you will need more bytes if specifying a Japanese or Chinese character. What you would like to have, i.e. the number of max. chars, cannot be given as it depends on the code page. – Trixx Oct 02 '19 at 14:03
@Trixx you say it is not that easy. I hope it is possible. I created a new question "How to determine the number of characters which are allowed in a field?" https://stackoverflow.com/questions/58203229/sap-rfc-how-to-determine-the-number-of-characters-which-are-allowed-in-a-field – guettli Oct 02 '19 at 14:08
@Trixx is talking about [variable-width encoding](https://en.wikipedia.org/wiki/Variable-width_encoding#Unicode_variable-width_encodings) (aka DBCS/MBCS), I'm not aware that ABAP takes into account such characters (a first character U+D800-U+DFFF followed by another character would correspond to a U+010000-U+10FFFF character). The ABAP documentation talks more about UCS-2 (ignores U+D800-U+DFFF) than UTF-16. What can be said for sure is that a text field of 80 bytes in a Unicode system is maximum 40 characters. – Sandra Rossi Oct 02 '19 at 15:17
1

So far, I also only know of UCS-2 characters being used. However, Unicode ABAP systems use either code page 4102 or 4103 which is UTF-16. So in theory this may change in the future and SAP can also store 4-byte characters in addition. The real UCS-2 code page (only allowing 2-byte characters) would be code page 4100 and 4101 instead. – Trixx Oct 02 '19 at 15:29
You also need to understand: the underlying C library also still supports Non-Unicode SAP systems. And even here it was not always clear, how many characters would fit into a parameter... For example, if the SAP system is running with CP 8000 (Shift-JIS Japanese), then an ABAP parameter of type CHAR10 can only be guaranteed to hold 5 characters... (So ABAP in fact is completely incorrect here: data type "CHAR10" means 10 bytes -- not 10 characters!!!) So what could be done when translating the ABAP world to C/C++? We took the only thing that was "reliable": the length in bytes... – Lanzelot Feb 11 '20 at 14:31

The meaning of nuc_length and uc_length parameters in PyRFC?

1 Answers1

Linked