clpfd coverage algorithm speed improvements?

Question

This is a follow up question to Can you use clpfd to implement a coverage algorithm?

I have put the code here: http://swish.swi-prolog.org/p/Coverage%20using%20Constraints%20%20.pl

There are two search procedures.

start_exhaustive_search(Positives,Negatives,r(Features, Value,cm(TP,FP)))

And a heuristic search :

start_search(Ps,Ns,Result).

The heuristic search will refine a rule until it does not cover any negatives. cm is for confusion matrix.

There are three ways to test the predicates, one with a small database accessible with pos(Ps) and negs(Ns). Then a larger database accessible with approved(Ps) and notapproved(Ns). This also has some predicates to turn the binary representation of used features into a list of named features.binary_to_features(Binary,Features). You can also generate a random matrix of examples using random_binary_matrix_x_y(X,Y,R) (With X as 9 the result will be compatible with the larger approved/notapproved example.).

Example exhaustive query:

?-approved(Ps),notapproved(Ns),start_exhaustive_search(Ps,Ns,Result).
Result = r([0, 0, 0, 0, 0, 0, 0, 1, 0, 0], 21, cm(6, 1)).

Example heuristic query:

?-approved(Ps),notapproved(Ns),start_search(Ps,Ns,Result).
  Result = [r([0, 0, 0, 0, 0, 0, 0, 1, 0, 0], 21, cm(6, 1)), r([0, 0, 0, 0, 0, 0, 0, 1, 0, 1], 20, cm(4, 0))]

So both methods do not seem to be as fast as I would imagine is possible using the constraints. Is there a way to improve the speed?

Also I am curious why I cant use dif/2 but have to use \== on line 98?

I am using card/2 to count the number of examples covered, I cant see another way to use this?

Do you really need the optimum, or are you content with a best effort solution? If the latter, you can use `call_with_time_limit/2` around the `labeling/2` goal to yield the best solution found within the time limit. — mat, Sep 19 '15 at 11:35
Ideally it would be good to find the optimum, but if it is not possible then it might be a good approach. However when I try replacing `labeling([max(Value)], [TP,FP])`, with for example `call_with_time_limit(500, labeling([max(Value)],[TP,FP]))` in the c_s_mining predicate, the top query returns `false` not the best answer found? — user27815, Sep 19 '15 at 11:44
In your case, you need a few more changes, since there may not be a solution for a particular value. So, you can either simulate the labeling, searching manually for increasingly better solutions, or rewrite the program in such a way that you know there is a solution for particular values. One thing that may help you is to relax some `sat/1` constraints, using for example `card(Nums,Exprs)` to state the cardinality of true formulas in `Exprs`. On a general note, such covering problems are intrinsically hard, and it is unlikely that you find provably optimum solutions for very large instances. — mat, Sep 19 '15 at 12:54
One thing I thought is that for for a Feature set A =[1,1,0,0] this will necessarily cover less examples that feature set B [1,0,0,0], as it is a specialisation and coverage can only go down. Is there a way to encode this as a constraint that might prune some of the search space? — user27815, Sep 22 '15 at 08:32
It's good to know that you are still thinking about this! I also want to work on this but currently cannot look into it. One strategy is to first find an initial cover, and then to gradually increase the limit and search for better covers. You may have some luck with using `card/2` constraints of CLP(B), however, the main limitation is that they only work in one direction: A list of instantiated numbers must already be given. Still, I recommend to experiment with CLP(B) a bit more. You can also try a SICStus evaluation copy (free of charge!) for often much better performance than SWI-Prolog. — mat, Sep 22 '15 at 08:44
I am experimenting with card/2 but not sure how best to use it yet.. — user27815, Sep 22 '15 at 12:59
Please make sure that the code is completely self-contained, so that others can easily try it out: Add the necessary `use_module/1` declarations, and clearly separate the code from the sample queries. — mat, Sep 23 '15 at 07:21

clpfd coverage algorithm speed improvements?

0 Answers0