dissecting this code, explanation - check if array subset of another array

Question

I found this example at wiki.bash-hackers.org, however it is not explained in detail, so I was hoping maybe someone here, could put some light on this, and explain what is happening.

I understand the first line of isSubset function, as it is taking passed args, and using indirect referencing, stores keys into internal arrays xkeys and ykeys.

2nd line is setting parameters, but I don't understand what ${@/%/[key]} is doing? Looks like substitution, changing % to [key], I have no clue what happens here.

Then in next line it compares arrays on number of elements, but shouldn't it be reverse, returning 1 if first array has more elements, because then it can't be subset of the second one?

Finally [[ ${!2+_} && ${!1} == ${!2} ]] || return 1, is pretty confusing.

isSubset() {
    local -a 'xkeys=("${!'"$1"'[@]}")' 'ykeys=("${!'"$2"'[@]}")'
    set -- "${@/%/[key]}"

    (( ${#xkeys[@]} <= ${#ykeys[@]} )) || return 1

    local key
    for key in "${xkeys[@]}"; do
        [[ ${!2+_} && ${!1} == ${!2} ]] || return 1
    done
}

main() {
    # "a" is a subset of "b"
    local -a 'a=({0..5})' 'b=({0..10})'
    isSubset a b
    echo $? # true

    # "a" contains a key not in "b"
    local -a 'a=([5]=5 {6..11})' 'b=({0..10})'
    isSubset a b
    echo $? # false

    # "a" contains an element whose value != the corresponding member of "b"
    local -a 'a=([5]=5 6 8 9 10)' 'b=({0..10})'
    isSubset a b
    echo $? # false
}

main

Entirely without snark. print out the intermediate values after each line to see what they are doing and you'll find out. (Use `declare -p ` not `echo`.) — Etan Reisner, Oct 22 '14 at 01:31

danadam · Accepted Answer · 2014-10-22T02:20:34.193

2nd line:

    ${@/%/[key]}

% as first character of the pattern indicates that pattern has to match at the end. There is nothing else in the pattern so the meaning is "replace empty string at the end, with '[key]'". After that positional parameters look like this:

    1 = a[key]
    2 = b[key]

Next line:

but shouldn't it be reverse, returning 1 if first array has more elements,

But it does that. Notice that || operator is used, so it will return 1 if the condition is not met. The condition is: "x.size <= y.size", so it will return 1 if "x.size > y.size".

Finally:

    [[ ${!2+_} && ${!1} == ${!2} ]] || return 1

To be honest, I don't know what +_ is for. As for the rest, notice that we are in a loop with a key variable. We also have key in our positional variables, so:

    ${!1}

becomes

    ${a[key])

and key variable takes values of keys from array a. So the whole test verifies that value with given key exists in the second array:

    [[ ${!2+_} && ...

and that value with that key in the first array is the same as the value with that key in the second array:

    ... && ${!1} == ${!2} ]]

The first condition is necessary to detect the case when you pass array a which at index i has empty string and array b which doesn't have index i:

local -a 'a=([1]="" 2 3)' 'b=([2]=2 {3..10})'
isSubset a b
echo $? # false

totally overseen `[[ ${!2+_} && ${!1} == ${!2} ]] || return 1` `||` in here, probably to much staring at screen, was interpreting this as `then` instead of `or`.. thanks for explaining other things! — branquito, Oct 22 '14 at 09:19

score 1 · Answer 2 · answered Oct 22 '14 at 02:04

The explanation for ${@/%/[key]} is this section of the bash man page:

${parameter/pattern/string}

The pattern is expanded to produce a pattern just as in pathname expansion. Parameter is expanded and the longest match of pat- tern against its value is replaced with string. If Ipattern begins with /, all matches of pattern are replaced with string. Normally only the first match is replaced. If pattern begins with #, it must match at the beginning of the expanded value of parameter. If pattern begins with %, it must match at the end of the expanded value of parameter. If string is null, matches of pattern are deleted and the / following pattern may be omit- ted. If parameter is @ or *, the substitution operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable sub- scripted with @ or *, the substitution operation is applied to each member of the array in turn, and the expansion is the resultant list.

Specifically the bit about % there in the middle. So ${@/%/[key]} is matching the end of the string for each value in the array and appending [key] to it.

Assuming a call to isSubset of isSubset a b where a='([0]="0" [1]="1" [2]="2" [3]="3" [4]="4" [5]="5")' and b='([0]="0" [1]="1" [2]="2" [3]="3" [4]="4" [5]="5" [6]="6" [7]="7" [8]="8" [9]="9" [10]="10")'. What happens in isSubset goes like this:

isSubset() {
    local -a 'xkeys=("${!'"$1"'[@]}")' 'ykeys=("${!'"$2"'[@]}")'

Interpolate $1 and $2 into the above line and we get

    local -a 'xkeys=("${!a[@]}")' 'ykeys=("${!b[@]}")'

Which expands to (via ${!arr[@]} array index expansion)

    local -a 'xkeys=(0 1 2 3 4 5)' 'ykeys=(0 1 2 3 4 5 6 7 8 9 10)'

At this point we now have xkeys and ykeys arrays of the keys of the arrays passed in.

    set -- "${@/%/[key]}"

Recall that $@ is @=(a b) and from the man page snippet above we know that this becomes

    set -- 'a[key]' 'b[key]'

set -- then sets the functions positional parameters for us so we have @=('a[key]' 'b[key]')

    (( ${#xkeys[@]} <= ${#ykeys[@]} )) || return 1

If xkeys is larger than ykeys then it can't be a subset so bail out. (This could be done before the set -- line and that would be slightly more efficient I imagine, though unlikely to matter in any but the hottest of loops.)

    local key
    for key in "${xkeys[@]}"; do

Loop over every key in xkeys (with the value assigned to key). (Note the variable name here, it is crucial.)[1]

        [[ ${!2+_} && ${!1} == ${!2} ]] || return 1

More indirection, this time on the positional parameters. The above expands to

        [[ ${b[key]+_} && ${a[key]} == ${b[key]} ]] || return 1

${b[key]+_} expands to _ if b[key] has a value and an empty string if it does not. (I'm not sure why this bothers with the alternate value expansion instead of just using ${!2} but there is probably a reason. It could be safety in the face of set -u or it could be safety against [[ interpreting the resulting string, though I don't think it does that but [ would have.) This test therefore passes when b[key] has a value and fails when it does not.

${a[key]} == ${b[key]} tests that the value in that index is the same in both arrays and the whole expression returns failure from the function when either part fails.

@danadam correctly explains, and it makes sense to include here as well, that the key detail here is that the [] indexing in array lookups does variable expansion so a[key] in positional parameter one is not looking for the "key" index in array a but is rather a[$key].

    done
}

I hope that all made sense. (And I hope I got it all right. =)

I really wanted to write "Note the variable name here, it is ... key." but the ensuing confusion wasn't worth the joke.

Re. "alternate value expansion", I was (still am) a bit confused because I thought that the format is ``${parameter:+word}`` (with colon). — danadam, Oct 22 '14 at 02:19
@danadam The manual has a one sentence mention of what the colon means there. It means "or null". `When not performing substring expansion, bash tests for a parameter that is unset or null; omitting the colon results in a test only for a parameter that is unset.` It is the last sentence in the paragraph before the explanation of `:-`. In general that distinction doesn't matter to people. Here it very much does. — Etan Reisner, Oct 22 '14 at 02:32

dissecting this code, explanation - check if array subset of another array

2 Answers2

Linked