Determine the number of characters which are allowed in a field?

Question

The is a follow-up question of The meaning of nuc_length and uc_length parameters in PyRFC?

With PyRFC I can get the function description like this:

get_function_description(rfc_name)

Per field I can read uc_length and nuc_length.

How can I determine the number of unicode characters which I can put into the field if nuc_length=40 and uc_length=80?

RFC_CHAR, RFC_NUM, RFC_DATE, RFC_TIME -> nuc_length (number of bytes for Non-UniCode system) always gives the maximum number of characters because in a Non-UniCode ABAP system, 1 byte = 1 character. Note that RFC_DATE is always 8 numeric characters (YYYYMMDD) and RFC_TIME is always 6 numeric characters (hhmmss). RFC_STRING is a string with a variable number of characters (maximum around 2 gigabytes). Other RFC_* types are not character fields. — Sandra Rossi, Oct 02 '19 at 15:24
I corrected the question by switching 40 and 80 (40 characters correspond to nuc_length=40 bytes and uc_length=80 bytes) — Sandra Rossi, Oct 02 '19 at 16:00

Trixx · Answer 1 · 2019-10-02T15:36:11.123

2

Unfortunately this is not possible, I think. From the given length in bytes you can only calculate a maximum number of characters which would fit into this field.

For unicode ABAP systems we know that SAP stores the character data in code pages 4102 / 4103 which is UTF-16 (big and little endian format). That means a character needs at least 2 bytes, i.e. the maximum length can be calculated as uc_length / 2 = 40 chars in your example. I don't think that SAP already uses any 4-byte character yet. However, this would be possible with code pages 4102 / 4103. Therefore it depends on which Unicode character you will put in the field, it might be that less than 40 characters could be stored in a field with uc_length=80.

This is even more difficult with non-Unicode ABAP systems. As long as you only use code page 1100 with English logon language, a character usually only needs 1 byte. But if for example, using Japanese and code page 8000, then text data can contain mixed US-ASCII characters and Japanese characters. That means the text field may contain both: 1-byte characters and 2-byte characters. And to make it even more difficult, even 3-byte characters exist for all non-unicode code pages including code page 1100. For example, there are some SAP specific character icons/symbols with this length. Hence, the nuc_length=40 field can contain 40 Unicode characters at the maximum, but also only 13 characters in the worst case. It depends on which code page is being used and which Unicode characters you are filling into the field.

edited Oct 02 '19 at 15:36

answered Oct 02 '19 at 15:27

Trixx

1,796
1
15
18

Nowadays, since ABAP 7.50, there are only Unicode systems. No need to discuss variable-width encodings, dixit [ABAP documentation](https://help.sap.com/doc/abapdocu_753_index_htm/7.53/en-US/index.htm?file=abentext_environment.htm): "The ABAP programming language supports a subset covered by UCS-2 and not the full UTF-16 set." – Sandra Rossi Oct 02 '19 at 18:04
ABAP works with UCS-2 (it doesn't consider a 4-bytes surrogate character as two distinct "characters") but for information the historical SAP GUI software (dynpro, ABAP list) renders the 4-bytes characters (U+10000-U+10FFFF) pretty well. For instance, (U+20021, encoded in UTF-16 as U+D840 & U+DC21) can be entered and rendered without any issue. occupies internally 2 characters so in a screen field of 4 characters, the maximum number of characters that can be entered/rendered is , or AA, or AAAA. Tested with SAP GUI 7.50. – Sandra Rossi Oct 02 '19 at 19:06
@Sandra Rossi: Congratulations that you only have to deal with Unicode systems nowadays and do not see the need to even discuss non-unicode scenarios anymore. Maybe you don't see it, but it will take many more years until there are really no such installations anymore. Even old R/3 release 3.1 has not fully died out yet. And the Unicode option was offered as of release 6.10. – Trixx Oct 02 '19 at 19:54
@Sandra Rossi: What the ABAP language runtime can or cannot do doesn't matter. This can change with the next release. Important is in which code page the data is stored. And that is UTF-16 AFAIK. You pointed out: U+20021 is ONE single character stored with 4 bytes in UTF-16. Yes, the ABAP language currently has this UCS-2 limitation. And the SAP GUI is out of scope as well, I think. The SAP GUI also uses special front-end code pages for displaying the back-end data. If it would use code page 4103 and you have an appropriate font installed locally, it would be displayed as one single character. – Trixx Oct 02 '19 at 20:03
I agree totally. About "no need to", sorry to be too much affirmative. About the SAP GUI, I meant that SAP decided to make it interpret UTF-16 surrogate characters, they did not limit it to UCS-2, so they implicitly admit that ABAP character variables may contain UTF-16 (i.e. including 4-bytes characters). NB to everyone: I think that the ABAP documentation says that it's UCS-2 because some ABAP statements like "substring" are problematic with 4-bytes characters. Erratum in my last comment: "ABAP works with UCS-2 (it DOES consider a 4-bytes surrogate character as two distinct "characters")" – Sandra Rossi Oct 03 '19 at 04:17

Determine the number of characters which are allowed in a field?

1 Answers1

Linked