8

If I do a match with a regular expression with ten captures:

/(o)(t)(th)(f)(fi)(s)(se)(e)(n)(t)/.match("otthffisseent")

then, for $10, I get:

$10 # => "t"

but it is missing from global_variables. I get (in an irb session):

[:$;, :$-F, :$@, :$!, :$SAFE, :$~, :$&, :$`, :$', :$+, :$=, :$KCODE, :$-K, :$,,
 :$/, :$-0, :$\, :$_, :$stdin, :$stdout, :$stderr, :$>, :$<, :$., :$FILENAME,
 :$-i, :$*, :$?, :$$, :$:, :$-I, :$LOAD_PATH, :$", :$LOADED_FEATURES,
 :$VERBOSE, :$-v, :$-w, :$-W, :$DEBUG, :$-d, :$0, :$PROGRAM_NAME, :$-p, :$-l,
 :$-a, :$binding, :$1, :$2, :$3, :$4, :$5, :$6, :$7, :$8, :$9]

Here, only the first nine are listed:

$1, :$2, :$3, :$4, :$5, :$6, :$7, :$8, :$9

This is also confirmed by:

global_variables.include?(:$10) # => false

Where is $10 stored, and why isn’t it stored in global_variables?

Sagar Pandya
  • 9,323
  • 2
  • 24
  • 35

3 Answers3

9

Ruby seems to handle $1, $2 etc. at the parser level:

ruby --dump parsetree_with_comment -e '$100'

Output:

###########################################################
## Do NOT use this node dump for any purpose other than  ##
## debug and research.  Compatibility is not guaranteed. ##
###########################################################

# @ NODE_SCOPE (line: 1)
# | # new scope
# | # format: [nd_tbl]: local table, [nd_args]: arguments, [nd_body]: body
# +- nd_tbl (local table): (empty)
# +- nd_args (arguments):
# |   (null node)
# +- nd_body (body):
#     @ NODE_NTH_REF (line: 1)
#     | # nth special variable reference
#     | # format: $[nd_nth]
#     | # example: $1, $2, ..
#     +- nd_nth (variable): $100

BTW, the maximum number of capture groups is 32,767 and you can access all via $n:

/#{'()' * 32768}/       #=> RegexpError: too many capture groups are specified

/#{'()' * 32767}/ =~ '' #=> 0
defined? $32767         #=> "global-variable"
$32767                  #=> ""
Stefan
  • 109,145
  • 14
  • 143
  • 218
  • 1
    Nice! This is the complement that my answer needed. :-) – Drenmi Jan 09 '16 at 10:50
  • 2
    Amazing power of Ruby. I will never run out of capture groups. – sawa Jan 09 '16 at 12:02
  • 3
    [PCRE2 supports 65,535 capture groups](http://stackoverflow.com/a/33928343/3832970), so it beats Ruby :) However, I doubt you will ever use more than 99, in real life, we'd rather get multiple matches than try to capture all details with one unwieldy regex. – Wiktor Stribiżew Jan 09 '16 at 13:46
6

The numbered variables returned from Kernel#global_variables will always be the same, even before they are assigned. I.e. $1 through $9 will be returned even before you do the match, and matching more won't add to the list. (They can also not be assigned, e.g. using $10 = "foo".)

Consider the source code for the method:

VALUE
rb_f_global_variables(void)
{
    VALUE ary = rb_ary_new();
    char buf[2];
    int i;

    st_foreach_safe(rb_global_tbl, gvar_i, ary);
    buf[0] = '$';

    for (i = 1; i <= 9; ++i) {
        buf[1] = (char)(i + '0');
        rb_ary_push(ary, ID2SYM(rb_intern2(buf, 2)));
    }

    return ary;
}

You can (after getting used to looking at C) see from the for loop that the symbols $1 through $9 are hard coded into the return value of the method.

So how then, can you still use $10, if the output of the global_variables doesn't change? Well, the output might be a bit misleading, because it would suggest your match data is stored in separate variables, but these are just shortcuts, delegating to the MatchData object stored in $~.

Essentially $n looks at $~[n]. You'll find this MatchData object (coming from the global table) is part of the original output from the method, but it is not assigned until you do a match.

As to what the justification for including $1 through $9 in the output of the function, you would need to ask someone on the Ruby core team. It might seem arbitrary, but there is likely some deliberation that went into the decision.

Drenmi
  • 8,492
  • 4
  • 42
  • 51
  • *"The values returned from Kernel#global_variables will always be the same"*--This is not true. If I do `$foo = 1`, then `$foo` will be added to `global_variables`. – sawa Jan 09 '16 at 10:15
  • 1
    One mystery still left - how is `$10` evaluated if it is not a global variable – Wand Maker Jan 09 '16 at 10:24
  • @WandMaker: If my understanding of the internals is correct, all `$n` format globals are delegated to the respective index in `$~`. (This would be the reason you can't assign them.) So in a sense, the output from this method is a bit "dishonest." – Drenmi Jan 09 '16 at 10:26
  • @Drenmi I believe what you are saying is probably whats happening - an evidence of that will seal the deal. – Wand Maker Jan 09 '16 at 10:28
  • 2
    @Drenmi The rubyspecs say so https://github.com/ruby/rubyspec/blob/f8358bd32e6d2c492f8d7e7bb5a35524d2756c3c/language/predefined_spec.rb#L89 – bliof Jan 09 '16 at 10:30
  • 2
    @Wand Maker it is a "global-variable" try `defined?($10)` :) – bliof Jan 09 '16 at 10:31
3

we consider this behavior as a bug. We fixed this in the trunk.

  • 1
    Are you a developer on the project in question? Could you say something about that, or in some way elaborate on what you mean. Perhaps which version will fix, or did fix, this? – melwil Jan 25 '18 at 18:49