1

I recently needed a list of compiled-in signal names so I could print nice messages like "Interrupted by SIGINT (2)".

get_defined_constants() is unusable for this as it jumbles SIGINT, SIGTRAP etc in amongst totally unrelated definitions (with the same integer values).

The signal names map to different values depending on OS, and sometimes they're not all compiled in to PHP, so the most straightforward clean solution would be a new function that just returns an array of compiled-in signal names.

Hmm... a function that returns a static array back to PHP userspace... that sounds like a really good first sourcecode-hacking project, right?

Nope :)


The code below (a bit further down) is a super-minimized testcase that illustrates the very strange brick wall I've crashed into.

I have a GINIT function initializing an extension global test_array as an array, which I then fill with some entries (exactly like my changes to pcntl would do) with add_assoc_long() (in this case using sprintf() to generate dummy strings for the array keys like !!!, """, ###, etc).

I then have a demo function test_test1() that ZVAL_COPYs the pre-built test_array to return_value.

Drumroll please; behold what happens when I try and print_r() the result:

Array
(
    [PWD] => 0
    [i336] => 1
    [LOGNAME] => 2
    [tty] => 3
    [HOME] => 4
    [LANG] => 5
    [user] => 6
    [xterm] => 7
    [TERM] => 8
    [i336] => 9
    [USER] => 10
    [:0] => 11
    [DISPLAY] => 12
    [SHLVL] => 13
    [9:22836] => 14
    [PATH] => 15
    [111] => 16
    [222] => 17
    [333] => 18
    [444] => 19
    [555] => 20
    [666] => 21
    [777] => 22
    [888] => 23
    [999] => 24
    [HG] => 25
    [MAIL] => 26
    [OLDPWD] => 27
    [] => 28
    [] => 29
    [] => 30
    [STDIN] => 31
    [STDOUT] => 32
    [STDERR] => 33
    [print_r] => 34
    [DDD] => 35
    [EEE] => 36
    [FFF] => 37
    [GGG] => 38
    [HHH] => 39
    [III] => 40
    [JJJ] => 41
    [KKK] => 42
    [LLL] => 43
    [MMM] => 44
    [NNN] => 45
    [OOO] => 46
    [PPP] => 47
    [QQQ] => 48
    [RRR] => 49
<<snipped>>

What's really weird is that entries 0 to 15 are corrupt; entries 16 to 24 are fine; entries 25 to 34 are corrupt; entries 35 on are fine.

0-15 / 16-24 makes a weird kind of sense; 25-34 / 35-∞ does not.

In any case, if I replace test_test1 with the following (slight modification of the code from the GINIT function):

    zval test;
    array_init(&test);

    for (int i = 0; i < 80; i++) {
        char buf[4];
        sprintf(buf, "%1$c%1$c%1$c", i+33);
        add_assoc_long(&test, buf, i);
    }

    ZVAL_COPY_OR_DUP(return_value, &test);

    zval_ptr_dtor(&test);

I get the somewhat more expected

(
    [!!!] => 0
    ["""] => 1
    [###] => 2
    [$$$] => 3
    [%%%] => 4
    [&&&] => 5
    ['''] => 6
    [(((] => 7
    [)))] => 8
    [***] => 9
    [+++] => 10
    [,,,] => 11
    [---] => 12
    [...] => 13
    [///] => 14
    [000] => 15
    [111] => 16
    [222] => 17
    [333] => 18
    [444] => 19
    [555] => 20
    [666] => 21
    [777] => 22
    [888] => 23
    [999] => 24
    [:::] => 25
    [;;;] => 26
    [<<<] => 27
    [===] => 28
<<snipped>>

Besides some hints about what I'm doing wrong (I know I've got something backwards... :) ), I would very much like to understand why PHP is dumping portions of what appears to be random environment variables into my array!


The main reason I've halted my own exploration/solving process and posted this question is my awareness that I don't know what I don't know, combined with the fact that I've no idea where to turn to try to resolve this.

There are an increasing number of resources offering PHP documentation, but unfortunately figuring out how to do simple tasks seems to require a lot of piecing-together of details from disparate sources (I'm stuck on something that honestly seems quite simple on the surface).

I also have questions about how up-to-date what I'm reading actually is.

An example: The ZEND_MODULE_GLOBALS_ACCESSOR() macro, used for thread-safely accessing per-module global values, is used 37 times (looks like by just under half the contents of ext/). And yet, all of the information I have read, including on the sites like phpinternals.net and phpinternalsbook.net, specifies a hard-requirement of including a certain 5-line #define in order to set up access to module globals. I stumbled on the aforementioned macro, which implements the #define in PHP itself so nobody has to do it by themselves anymore, by reading the source code.

I can completely accept that things aren't in exact sync - and that maybe that macro is new.

But where do I go for updated reference information that answers the questions I have?

Genuine question.


I've included config.m4 below, so this could be compiled for testing:

php_test.h:

#ifndef PHP_TEST_H
# define PHP_TEST_H

extern zend_module_entry test_module_entry;
# define phpext_test_ptr &test_module_entry

# define PHP_TEST_VERSION "0.1.0"

ZEND_BEGIN_MODULE_GLOBALS(test)
    zval test_array;
ZEND_END_MODULE_GLOBALS(test)

# if defined(ZTS) && defined(COMPILE_DL_TEST)
ZEND_TSRMLS_CACHE_EXTERN()
# endif


ZEND_DECLARE_MODULE_GLOBALS(test)

#endif  /* PHP_TEST_H */

test.c:

#ifdef HAVE_CONFIG_H
# include "config.h"
#endif

#include "php.h"
#include "ext/standard/info.h"
#include "php_test.h"

PHP_FUNCTION(test_test1)
{
    ZVAL_COPY(return_value, &ZEND_MODULE_GLOBALS_ACCESSOR(test, test_array));   
}

PHP_RINIT_FUNCTION(test)
{
#if defined(ZTS) && defined(COMPILE_DL_TEST)
    ZEND_TSRMLS_CACHE_UPDATE();
#endif

    return SUCCESS;
}

PHP_MINIT_FUNCTION(test)
{
    return SUCCESS;
}


PHP_GSHUTDOWN_FUNCTION(test)
{ }

PHP_GINIT_FUNCTION(test)
{

    // Thanks to #php.pecl on efnet for pointing me in the direction of `GINIT`.
    // I'd seriously hit my SIGSEGV limit, and really appreciated the valid pointers (punintended).

    array_init(&ZEND_MODULE_GLOBALS_ACCESSOR(test, test_array));

    for (int i = 0; i < 80; i++) {
        char buf[4];
        sprintf(buf, "%1$c%1$c%1$c", i+33);
        add_assoc_long(&ZEND_MODULE_GLOBALS_ACCESSOR(test, test_array), buf, i);
    }

    return SUCCESS;

}

PHP_MINFO_FUNCTION(test)
{
    php_info_print_table_start();
    php_info_print_table_header(2, "test support", "enabled");
    php_info_print_table_end();
}

ZEND_BEGIN_ARG_INFO(arginfo_test_test1, 0)
ZEND_END_ARG_INFO()

ZEND_BEGIN_ARG_INFO(arginfo_test_test2, 0)
    ZEND_ARG_INFO(0, str)
ZEND_END_ARG_INFO()

static const zend_function_entry test_functions[] = {
    PHP_FE(test_test1, arginfo_test_test1)
    PHP_FE_END
};

zend_module_entry test_module_entry = {
    STANDARD_MODULE_HEADER,
    "test",                     /* Extension name */
    test_functions,             /* zend_function_entry */
    PHP_MINIT(test),            /* PHP_MINIT - Module initialization */
    NULL,                       /* PHP_MSHUTDOWN - Module shutdown */
    PHP_RINIT(test),            /* PHP_RINIT - Request initialization */
    NULL,                       /* PHP_RSHUTDOWN - Request shutdown */
    PHP_MINFO(test),            /* PHP_MINFO - Module info */
    PHP_TEST_VERSION,           /* Version */
    PHP_MODULE_GLOBALS(test),
    PHP_GINIT(test),
    PHP_GSHUTDOWN(test),
    NULL,                       /* PRSHUTDOWN() */
    STANDARD_MODULE_PROPERTIES_EX
};

#ifdef COMPILE_DL_TEST
# ifdef ZTS
ZEND_TSRMLS_CACHE_DEFINE()
# endif
ZEND_GET_MODULE(test)
#endif

config.m4:

PHP_ARG_ENABLE([test2],
  [whether to enable test2 support],
  [AS_HELP_STRING([--enable-test2],
    [Enable test2 support])],
  [no])

if test "$PHP_TEST2" != "no"; then
  AC_DEFINE(HAVE_TEST2, 1, [ Have test2 support ])

  PHP_NEW_EXTENSION(test2, test2.c, $ext_shared)
fi
Barmar
  • 741,623
  • 53
  • 500
  • 612
i336_
  • 1,813
  • 1
  • 20
  • 41
  • How about just looping over `get_defined_constants()` and getting the ones that begin with `SIG`? – Barmar Sep 17 '19 at 16:01
  • An excellent suggestion, and what I tried to begin with. The returned array started like `[-1 => SIG_DFL, 0 => SIG_IGN, 1 => SIG_UNBLOCK, 2 => SIG_SETMASK, 3 => SIGQUIT, ...]` because of all the overlapping values. I really, really don't want to do `preg_match('/^SIG[^_]/')` - changes to `pcntl` could theoretically break behavior in the future. – i336_ Sep 17 '19 at 16:05
  • @Barmar: [Thanks for the addition of the `php-internals` tag!] – i336_ Sep 17 '19 at 16:05
  • While there could be a few false positives, I think if you just use that regexp it will be pretty correct. You could also parse `/usr/include/sys/signal.h` – Barmar Sep 17 '19 at 16:06
  • 1
    Although doing that would require interpreting the `#ifdef` lines that conditionalize the names. – Barmar Sep 17 '19 at 16:08
  • 1
    You might want to drop by https://chat.stackoverflow.com/rooms/11/php for help with PHP extension development. – NikiC Sep 17 '19 at 17:43
  • Last-minute reprioritization ftl. Thanks very much for the pointer/invitation. I'm looking forward to following up on it... a little later than I'd hoped/anticipated. :) – i336_ Sep 19 '19 at 00:36

1 Answers1

1

GINIT is invoked prior to request startup. array_init() and add_assoc_long() (and most other APIs) use the per-request allocator.

You could use persistent allocations instead (by using lower-level zend_hash and zend_string APIs and passing persistent=1 flags), but you still wouldn't be allowed to return such an array from a PHP function, because this violates the PHP memory model (you are not permitted to change the refcount of a persistent value during a request).

If you want to place a value using the per-request allocator inside a global, you need to do so inside RINIT (and then destroy inside RSHUTDOWN). These handlers are invoked as part of each request.

Though for your particular use-case I would recommend not using globals at all, and instead simply constructing the array anew each time the function is called. It is not performance-critical.

NikiC
  • 100,734
  • 37
  • 191
  • 225
  • Ah, I *see*. (Arrays cannot be accessed such that the refcount is not touched?) FWIW, I was "borrowing" using MINIT because `pcntl` was registering all its signal handler constants at that point... and I have no idea what internal API(s) I'd use to stash the list of signal names so I could later return them :) *Maybe* I could move that initialization function to RINIT... – i336_ Sep 19 '19 at 00:40
  • You can create a persistent array and use it, just not return it directly (you'd have to copy it). You'd have to drop down to the zend_hash APIs for it, see especially the "persistent" argument in zend_hash_init. Alternatively you could extract the logic that registers the SIG constants into a separate function that invokes a callback. Then you could use it once for registering signals and once for creating the array. – NikiC Sep 21 '19 at 11:17