6

How to define in ISO Prolog a (meta-logical) predicate for the intersection of two lists of variables that runs in linear time? The variables may appear in any determined order. No implementation dependent property like the "age" of variables must influence the outcome.

In analogy to library(ordsets), let's call the relation varset_intersection(As, Bs, As_cap_Bs).

?- varset_intersection([A,B], [C,D], []).
true.

?-varset_intersection([A,B], [B,A], []).
false.

?- varset_intersection([A,B,C], [C,A,D], Inter).
Inter = [A,C].
or
Inter = [C,A].

?- varset_intersection([A,B],[A,B],[A,C]).
B = C
or
A = B, A = C

?- varset_intersection([A,B,C],[A,B],[A,C]).
idem

That is, the third argument is an output argument, that unifies with the intersection of the first two arguments.

See this list of the built-ins from the current ISO standard (ISO/IEC 13211-1:1995 including Cor.2).

(Note, that I did answer this question in the course of another one several years ago. However, it remains hidden and invisible to Google.)

false
  • 10,264
  • 13
  • 101
  • 209
  • 1
    The answer to the first query (`varset_intersection([A,B], [B,A], []).`) is false, right? – Tudor Berariu Jan 04 '15 at 09:13
  • What should be the result of the following queries: `varset_intersection([A,B],[A,B],[A,C]).` and `varset_intersection([A,B,C],[A,B],[A,C]).` Should these goals be satisfied by unifying the *real* intersection with the third argument? – Tudor Berariu Jan 04 '15 at 09:36
  • It should be clear now. – false Jan 04 '15 at 10:35
  • One thing I need to further understand: by putting that *or* between different results you mean that the goal should be satisfied only once with any of those two results? – Tudor Berariu Jan 04 '15 at 10:47
  • I give up... how to do in O(|L1|+|L2|) ? waiting for some clever 'trick'... – CapelliC Jan 04 '15 at 11:06
  • @TudorBerariu: Only once. You can choose which one. But your choice must not depend on implementation dependent properties like variable age. – false Jan 04 '15 at 11:52
  • Why wouldn't `B=D, Inter=[A,B,C]` be a solution for the third example query? I don't see the reason for that not being a valid solution but `B=C` being one for the fifth query. – Tudor Berariu Jan 04 '15 at 14:46
  • @TudorBerariu: in `varset_intersection([A,B,C], [C,A,D], Inter)` only `A` and `C` are in the intersection. So why should `B = D`? – false Jan 04 '15 at 15:33
  • Well, it wasn't clear for me what relation this predicate should express. In that case you could say that if `B = D` then the intersection is `[A,B,C]`. When a list of variables (that also appear in the first two lists) is given as the third argument, unification between variables might happen and lead to some strange results [at least for me] that I could not separate from the one I suggested. – Tudor Berariu Jan 04 '15 at 16:14

4 Answers4

3

If term_variables/2 works in a time linear with the size of its first argument, then this might work:

varset_intersection(As, Bs, As_cap_Bs):-
    term_variables([As, Bs], As_and_Bs),
    term_variables(As, SetAs),
    append(SetAs, OnlyBs, As_and_Bs),
    term_variables([OnlyBs, Bs], SetBs),
    append(OnlyBs, As_cap_Bs, SetBs).

Each common variable appears only once in the result list no matter how many times it appears in the two given lists.

?- varset_intersection2([A,_C,A,A,A], [A,_B,A,A,A], L).
L = [A].

Also, it might give strange results as in this case:

?- varset_intersection([A,_X,B,C], [B,C,_Y,A], [C, A, B]).
A = B, B = C.

(permutation/2 might help here).

Tudor Berariu
  • 4,910
  • 2
  • 18
  • 29
2

If copy_term/2 uses linear time, I believe the following works:

varset_intersection(As, Bs, Cs) :-
    copy_term(As-Bs, CopyAs-CopyBs),
    ground_list(CopyAs),
    select_grounded(CopyBs, Bs, Cs).

ground_list([]).
ground_list([a|Xs]) :-
    ground_list(Xs).

select_grounded([], [], []).
select_grounded([X|Xs], [_|Bs], Cs) :-
    var(X),
    !,
    select_grounded(Xs, Bs, Cs).
select_grounded([_|Xs], [B|Bs], [B|Cs]) :-
    select_grounded(Xs, Bs, Cs).

The idea is to copy both lists in one call to copy_term/2 to preserve shared variables between them, then ground the variables of the first copy, then pick out the original variables of the second list corresponding to the grounded positions of the second copy.

1

If unify_with_occurs_check(Var, ListOfVars) fails or succeeds in constant time, this implementation should yield answers in linear time:

filter_vars([], _, Acc, Acc).
filter_vars([A|As], Bs, Acc, As_cap_Bs):-
    (
        \+ unify_with_occurs_check(A, Bs)
      ->
        filter_vars(As, Bs, [A|Acc], As_cap_Bs)
      ;
        filter_vars(As, Bs, Acc, As_cap_Bs)
    ).

varset_intersection(As, Bs, As_cap_Bs):-
    filter_vars(As, Bs, [], Inter),
    permutation(Inter, As_cap_Bs).

This implementation has problems when given lists contain duplicates:

?- varset_intersection1([A,A,A,A,B], [B,A], L).
L = [B, A, A, A, A] ;

?- varset_intersection1([B,A], [A,A,A,A,B], L).
L = [A, B] ;

Edited : changed bagof/3 to a manually written filter thanks to observation by @false in comments below.

Tudor Berariu
  • 4,910
  • 2
  • 18
  • 29
  • Nice. Lifting the scan inside unify_with_occurs_check could work, if any implementation actually is able to do in constant time - which I think it's unlikely. Maybe permutation/2 it's just distracting... – CapelliC Jan 04 '15 at 11:31
  • `unify_with_occurs_check` requires in the worst case time proportional to the *entire* term. – false Jan 04 '15 at 11:53
  • `bagof/3` is quadratic: The witness of variables needs to contain all `As` and `Bs`. That is, for a single variable `V` |`As`|+|`Bs`| space is needed. This consideration is entirely independent of `unify_with_occurs_check/2`! – false Jan 04 '15 at 15:36
  • Thanks, I removed `bagof/3` and I wrote a different predicate to collect the solutions. – Tudor Berariu Jan 04 '15 at 16:40
1

A possible solution is to use a Bloom filter. In case of collision, the result might be wrong, but functions with better collision resistance exist. Here is an implementation that uses a single hash function.

sum_codes([], _, Sum, Sum).
sum_codes([Head|Tail], K, Acc,Sum):-
    Acc1 is Head * (256 ** K) + Acc,
    K1 is (K + 1) mod 4,
    sum_codes(Tail, K1, Acc1, Sum).

hash_func(Var, HashValue):-
    with_output_to(atom(A), write(Var)),
    atom_codes(A, Codes),
    sum_codes(Codes, 0, 0, Sum),
    HashValue is Sum mod 1024.

add_to_bitarray(Var, BAIn, BAOut):-
    hash_func(Var, HashValue),
    BAOut is BAIn \/ (1 << HashValue).

bitarray_contains(BA, Var):-
    hash_func(Var, HashValue),
    R is BA /\ (1 << HashValue),
    R > 0.

varset_intersection(As, Bs, As_cap_Bs):-
    foldl(add_to_bitarray, As, 0, BA),
    include(bitarray_contains(BA), Bs, As_cap_Bs).

I know that foldl/4 and include/3 are not ISO, but their implementation is easy.

Tudor Berariu
  • 4,910
  • 2
  • 18
  • 29
  • Note that in a standard conforming system `write(Var)` with `Var` being an uninstantiated variable, may always give the same text, like `_1`. So this solution depends on an implementation dependent feature (the naming of variables). Further, the names might change due to garbage collection in between. It is much much simpler! – false Jan 04 '15 at 15:35