4

In XS I can pass the length of a string argument to a C function using the length keyword:

static int foo(const char *s, size_t len)
{
    return 1;
}

MODULE = Foo        PACKAGE = Foo

void
foo(char *s, STRLEN length(s))

However how can I get the length of the string if I need it inside the CODE block?

void
foo(char *s, STRLEN length(s))
CODE:
    ...
    foo(s, ???);

I can use STRLEN_length_of_s or XSauto_length_of_s variable autogenerated by xsubpp but it feels a bit hardcoded. Is there a way (possibly a predefined macro) I can use to get the variable name? Or can I assign my own name to the length argument? Or do I need to resort to declaring the argument as SV * and then get the length myself with SvPV in CODE section?

manison
  • 649
  • 6
  • 17

1 Answers1

2

First of all, if you're using char* in your XS prototype (and the default typemap), your code is buggy.

You want

void
foo(SV* sv)
PREINIT:
    STRLEN len;
    char* s;
CODE:
    s = SvPVbyte(sv, len);
    foo(s, len);

or

void
foo(SV* sv)
PREINIT:
    STRLEN len;
    char* s;
CODE:
    s = SvPVutf8(sv, len);
    foo(s, len);

Remember that Perl strings are sequences of 32-bit or 64-bit numbers, while C strings are sequences of 8-bit chars.[1] Some conversion needs to occur, and you need to specify which one.

In the first case, each character of the string will be a char of s.

In the second case, s will be provided the Perl string encoded using utf8.


  1. Technically, a char can be larger than 8 bits, but I don't think it can be on systems supported by perl.
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • You are saying that one should always explicitly choose between byte or utf-8 representation depending on the underlying C function? The default typemap uses SvPV which is not suitable for representing strings, right? So your preferred way is to work directly with SV in argument and retrieve the pointer and length in the PREINIT section. The length keyword is not the right way to do it, I assume. – manison Mar 07 '18 at 18:47
  • 1
    Re "*You are saying that one should always explicitly choose between byte or utf-8 representation depending on the underlying C function?*", There's a third option: Using `SvPV` along with `SvUTF8` to determine the encoding of the buffer returned by `SvPV`. – ikegami Mar 07 '18 at 19:22
  • 1
    Re "*The default typemap uses SvPV which is not suitable for representing strings, right?*", When using `SvPV` without `SvUTF8`, you don't know what you have. For example, `"\xE9"` in Perl could equally result in `"\xE9\x00"` or `"\xC3\xA9\x00"` in C when using `SvPV`. – ikegami Mar 07 '18 at 19:25
  • 1
    Re "*So your preferred way*", Adjusting the typemap is equally acceptable. – ikegami Mar 07 '18 at 19:25
  • I had to move the `SvPVxxx` call into the `INIT` section, since in the `PREINIT` the SV argument is not popped from the stack yet, causing the _'sv' was not declared in this scope_ error. – manison Mar 08 '18 at 07:41
  • Why should `char *` in an XS prototype be buggy? – nwellnhof Mar 08 '18 at 12:38
  • @nwellnhof, It shouldn't be, but people make mistakes. – ikegami Mar 08 '18 at 16:41
  • @manison, Ah ok. Fixed by move it to the `CODE` section. Note that you must leave the declaration in the `PREINIT` section. (Or you could move it into curlies in your `CODE` section (e.g. `CODE: { STRLEN len; char* s = SvPVutf8(sv, len); ... }`) – ikegami Mar 08 '18 at 16:56