Subsequence with a limit

Question

This is my problem:

Write a procedure distance(List, Nemptylist, SubList)/3 that checks if Sublist is a sublist of List with a distance of not more than N between items constraint (N is implemented as Nemptylist – a list of N anonymous variables). Assume that List is fully instantiated. Duplicate answers are allowed, as long as their number is finite.

For example, for N = 2 :
?- distance( [a,t,d,r,a,n,c,b,c] , [_,_], [a,b,c] ).
true

?- distance( [m,a,t,d,r,b,c,t] , [_,_] , [a,b,c] ).
false

?- distance([a, t, d, r, a, n, c, b], [_, _], [a, b, c]).
false

?- distance([c, c, c, a, c, c, c], [_, _], [a]).
true.

I've been sitting for hours, trying to solve this problem and eventually the examples above worked, but then i ran some tests and they failed.

My solution for now is as follows:

distance( L1 , L2 , [X]   ) :-
  member(X,L1) .
distance( L1 , L2 , [H|T] ) :-
  distance(L1,L2,T) ,
  append(Y,Z,L2) ,
  T=[Hs|Ts] ,
  append([H],Y,W) ,
  append(W,[Hs],R) ,
  sublist(R,L1) .

prefix(X,L) :- append(X, _, L).

suffix(X,L) :- append(_, X, L).

sublist(X,L) :- suffix(S,L) , prefix(X,S) .

when i try to run this test:

distance( [r,a,n,c,b,c],[],X) .

it fails due to run-time exceeded error.

I debugged it for hours and I'm really exhausted. Please help me finish this horrible assignment.

false · Answer 1 · 2014-06-14T11:55:47.760

Here is a solution step-by-step starting with an incomplete definition:

distance_tentative(Xs, _Ys, Zs) :-
   phrase(( ..., seq(Zs), ... ), Xs).

... --> [] | [_], ... .

seq([]) --> [].
seq([E|Es]) --> [E], seq(Es).

This solution is too specialized because it describes only substrings but not subsequences. A subsequence is rather:

subseq([]) --> [].
subseq([E|Es]) --> [E], subseq(Es).
subseq(Es) --> [_], subseq(Es).

Now, we want to limit the number of intermediary unrelated elements. That is, we want to limit the application of the last rule to the length of this list argument LN.

subseq_n([], _) --> [].
subseq_n([E|Es], LN) --> [E], subseq_n(Es, LN).
subseq_n(Es, [_|LN]) --> [_], subseq_n(Es, LN).

Maybe the last rule should rather read:

subseq_n(Es, [E|LN]) --> [E], subseq_n(Es, LN).

I suspect some problem in the translation of the problem statement. In any case, now we have:

distance(Xs, Ys, Zs) :-
   phrase(( ..., subseq_n(Zs, Ys), ... ), Xs).

There are many redundant answers, but you stated that this is OK.

Optimizations

There is a lot of redundancy that is, ambiguity, between the first ... and the start of the first element of subseq_n//2; similarly, between subseq_n//2 and ... at the end. Further, if Zs is empty, a single answer would suffice. Brief

distance(_Xs, _Ys, []).
distance(Xs, Ys, [Z|Zs]) :-
   phrase( ( ..., [Z], rsubseq_n(Zs, Ys), ... ), Xs).

rsubseq_n([], _) --> [].
rsubseq_n([E|Es], Ys) --> [E], rsubseq_n(Es, Ys).
rsubseq_n([E|Es], [_|Ys]) --> [_], rsubseq_n([E|Es], Ys).

Note that the "distance list" is now used only within the subsequence.

This program has very favorable termination properties:

distance(A,B,C)terminates_if b(A).

So only the first argument has to be known to make the predicate terminate.

Edit: Your problem statement has been ambiguous w.r.t. where the distance N applies to:

... with a distance of not more than N between items constraint ...

This can mean a total edit distance of no more than N, or a distance between each consecutive pair. So assuming that the distance between each consecutive pair is meant:

distanceII(_Xs, _Ys, []).
distanceII(Xs, Ys, [Z|Zs]) :-
   phrase( ( ..., [Z], rsubseq_d(Zs, Ys), ... ), Xs).

rsubseq_d([], _) --> [].
rsubseq_d([E|Es],Max) -->
   maxseq(Max),
   [E],
   rsubseq_d(Es, Max).

maxseq(_) --> [].
maxseq([_|Es]) --> [_], maxseq(Es).

Tudor Berariu · Accepted Answer · 2014-06-14T08:37:24.967

I see two major problems with your implementation:

You are unifying elements from NEmptyList with elements from List in the sublist(R, L1) goal. Future sublist goals might fail as NEmptyList will not unify with any N consecutive elements from List. If you want to use append (which is ok), you should build a new list of length at most N every time (see below).
You might validate different sequences from SubList with the same unique sequence from List. To demonstrate this, try this with your solution:
```
?- distance([a,a],[],[a,a,a,a,a,a]).
true ;
true ;
false.
```

Here is a solution:

distance(_, _, []).
distance(List, NEmpty, Sublist):-
    append(_, Right, List),   % The first element can be anywhere in the list
    distance2(Right, NEmpty, Sublist).

distance2([X|_], _, [X]).
distance2(List, NEmpty, [X|Sublist]):-
    length(NEmpty, N),
    between(0, N, N1),
    length(AtMostNEmpty, N1), % Make a different list each time of length <N
    append([X|AtMostNEmpty], Right, List),
    distance2(Right, NEmpty, Sublist).

Subsequence with a limit

2 Answers2

Optimizations