2

Say I want to find the set of features/attributes that differentiate two classes in a simple matching manner can I use clpfd in prolog to do this?

c_s_mining(Features,Value):-
 Features = [F1,F2,F3,F4],
 Features ins 0..1,
 ExampleA = [A1,A2,A3,A4],
 ExampleB =[B1,B2,B3,B4],
 ExampleC =[C1,C2,C3,C4],
 A1 #=0, A2#=1,A3#=0,A4#=1,
 B1 #=0, B2#=1,B3#=0,B4#=1,
 C1 #=1, C2#=0,C3#=0,C4#=1,

 ExampleD =[D1,D2,D3,D4],
 ExampleE =[E1,E2,E3,E4],
 ExampleQ =[Q1,Q2,Q3,Q4],
 D1#=1,D2#=0,D3#=1,D4#=0,
 E1#=1,E2#=0,E3#=1,E4#=0,
 Q1#=0,Q2#=1,Q3#=1,Q4#=0,

 Positives =[ExampleA,ExampleB,ExampleC],
 Negatives = [ExampleD,ExampleE,ExampleQ],
 TP in 0..sup,
 FP in 0..sup,
 covers(Features,Positives,TP),
 covers(Features,Negatives,FP),
 Value  in inf..sup,
 Value #= TP-FP.


covers(Features,Examples,Number_covered):-
   findall(*,(member(E,Examples),E=Features),Covers), length(Covers,Number_covered).

Each example is described by four binary features, and there are three positive examples (A,B,C) and three negative examples (D,E,Q).

An example is covered by a set of selected features if they match. So for example if Features is unified with [0,1,0,1], then this will match two positives and 0 negatives.

I set Value to be equal to TP (true positives) - TN (true negatives). I want to maximise Value and find the corresponding set of features.

I query ?-c_s_mining(Features,Value),labelling([max(Value)],[Value]). The answer I expect is: Features =[0,1,0,1], Value =2 but I get Features =[_G1,_G2,_G3,G4],Value =0, G1 in 0..1, G2 in 0..1, G3 in 0..1, G4 in 0..1.

user27815
  • 4,767
  • 14
  • 28

1 Answers1

3

Reification of CLP(FD) constraints

To reason about what is matched and what is not, use constraint reification: It allows you to reflect the truth value of a constraint into a CLP(FD) variable denoting a Boolean value.

You can perform arithmetic with such values to denote the number of matched examples etc.

For example, in your case, you can write:

:- use_module(library(clpfd)).

c_s_mining(Features, Value) :-
    ExampleA = [0,1,0,1],
    ExampleB = [0,1,0,1],
    ExampleC = [1,0,0,1],

    ExampleD = [1,0,1,0],
    ExampleE = [1,0,1,0],
    ExampleQ = [0,1,1,0],

    same_length(Features, ExampleA),
    Features ins 0..1,
    Positives = [ExampleA,ExampleB,ExampleC],
    Negatives = [ExampleD,ExampleE,ExampleQ],
    covers_number(Features, Positives, TP),
    covers_number(Features, Negatives, FP),
    Value #= TP-FP.


covers_number(Features, Examples, Number):-
    maplist(covers_(Features), Examples, Numbers),
    sum(Numbers, #=, Number).

covers_([F1,F2,F3,F4], [E1,E2,E3,E4], Covered) :-
    Covered #<==> (F1#=E1 #/\ F2#=E2 #/\ F3#=E3 #/\ F4#=E4).

And then use the optimisation options of labeling/2 to get largest values first:

?- c_s_mining(Fs, Value), labeling([max(Value)], Fs).
Fs = [0, 1, 0, 1],
Value = 2 ;
Fs = [1, 0, 0, 1],
Value = 1 ;
Fs = [0, 0, 0, 0],
Value = 0 ;
etc.

Notice also that I have removed some superfluous constraints, such as Value in inf..sup, since the constraint solver can figure them out on its own.


CLP(B): A declarative alternative for Boolean constraints

For the case of such Boolean patterns, also check out CLP(B): Constraint Logic Programming over Boolean variables, available for example in SICStus Prolog and SWI. Using CLP(B) requires you formulate the search a bit differently, since it lacks the powerful labeling options of CLP(FD). However, in contrast to CLP(FD), CLP(B) is complete and may detect inconsistencies as well as entailed constraints much earlier.

In the following code, I am using CLP(FD) to guide the search for optimal values, and then use CLP(B) to state the actual constraints. A final call of labeling/1 (note that this is from library(clpb), not to be confused with CLP(FD)'s labeling/2) is used to ensure ground values for all CLP(B) variables. At the point it appears, it is only a formality in some sense: We already know that there is a solution at this point, thanks to CLP(B)'s completeness.

:- use_module(library(clpb)).
:- use_module(library(clpfd)).

c_s_mining(Features, Value):-
    ExampleA = [0,1,0,1],
    ExampleB = [0,1,0,1],
    ExampleC = [1,0,0,1],

    ExampleD = [1,0,1,0],
    ExampleE = [1,0,1,0],
    ExampleQ = [0,1,1,0],

    same_length(Features, ExampleA),
    Positives = [ExampleA,ExampleB,ExampleC],
    Negatives = [ExampleD,ExampleE,ExampleQ],
    [TP,FP] ins 0..3, % (in this case)
    Value #= TP-FP,
    labeling([max(Value)], [TP,FP]),
    covers_number(Features, Positives, TP),
    covers_number(Features, Negatives, FP),
    labeling(Features).

covers_number(Features, Examples, Number):-
    maplist(covers_(Features), Examples, Numbers),
    sat(card([Number], Numbers)).

covers_([F1,F2,F3,F4], [E1,E2,E3,E4], Covered) :-
    sat(Covered =:= ((F1=:=E1)*(F2=:=E2)*(F3=:=E3)*(F4=:=E4))).
mat
  • 40,498
  • 3
  • 51
  • 78
  • Thanks that is great. One thing, Is it possible to also return a result such as Fs =[0,X,0,1], X in 0..1, Value =3. so that the matching has variables? – user27815 Sep 14 '15 at 13:39
  • I mean Fs =[Y,X,0,1] in this case. – user27815 Sep 14 '15 at 14:05
  • You can of course use partially instantiated patterns in the query, for example: `?- c_s_mining([0,X,0,1], V).`. In this concrete case, this yields two solutions, with different values depending on whether `X` is 1 or not. So, in general it is not possible to give a single solution for `V`, because it depends on the concrete features. For `Fs = [X,X,0,1]`, the CLP(B) version does what you want: `?- c_s_mining([X,X,0,1], V).` yields: `V = 0`, and constrains `X` to Boolean values without instantiating it. – mat Sep 14 '15 at 14:07
  • I would expect c_s_mining([X,X,0,1],V) to yield V =3 rather than V= 0, Because [X,X,0,1] unifies with three positive examples but no negative examples.. – user27815 Sep 14 '15 at 14:11
  • `[X,X|_]` means a list whose first 2 elements are *the same*, and there is no such example in either the positives or the negatives. – mat Sep 14 '15 at 14:13
  • Sorry yes I meant , Fs=[Y,X,0,1] but either way I guess it is not possible to do exactly what I want if it has to give concrete solutions. I would want it to generate all patterns including variables, not for me to give a specific instance. (And then return them in descending order of V) . – user27815 Sep 14 '15 at 14:18
  • You can adapt the CLP(B) version to yield variables if you just remove the `labeling/1` goal. Other than that, you can create ground solutions and later apply additional reasoning over the found patterns, searching for generalizations. This would warrant discussion in its own dedicated question though. – mat Sep 14 '15 at 14:22
  • Okay Thanks-ill have a play around and see how I go :) – user27815 Sep 14 '15 at 14:24