i am trying to use WARMR to find frequent relational patterns in my data; for this i am using ALEPH in SWI-Prolog. however, i am struggling to figure out how to do this and why my previous attempts did not work.
i want to make a toy example work before i move on to my full data. for this i took the toy "train" data from the aleph pack page: http://www.swi-prolog.org/pack/list?p=aleph
the Aleph manual states about the ar
search:
ar
Implements a simplified form of the type of association rule search conducted by the WARMR system (see L. Dehaspe, 1998, PhD Thesis, Katholieke Universitaet Leuven). Here, Aleph simply finds all rules that cover at least a
pre-specified fraction of the positive examples. This fraction is specified by the parameter pos_fraction.
accordingly i have inserted
:- set(search,ar).
:- set(pos_fraction,0.01).
into the background file (and deleted :- set(i,2).
)) and erased the .n file of negative examples. i have also commented out all the determinations and the modeh
declaration logic being that we are searching for frequent patterns, not rules (i.e. in a supervised context head would be an "output" variable and clauses in the body -- "inputs" trying to explain the output), i.e. it is an unsupervised task.
now, the original trains
dataset is trying to construct rules for "eastbound" trains. this is done by having predicates like car
, shape
, has_car(train, car)
etc. originally all the background knowledge relating to these is located in the .b file and the five positive examples (e.g eastbound(east1).
) in the .f file (+ five negative examples, e.g. eastbound(west1).
, in the .n file). leaving files unchanged (save for the changes described above) and running induce.
does not produce a sensible result (it would return ground terms like train(east1)
as a "rule", for example). i have tried moving some of the background knowledge to the .f file but that did not produce anything sensible either.
how do i go about constructing the .f and .b files? what should to into the positive examples file if we are not really looking to explain any positive examples (which would surely constitute a supervised problem) but instead to find frequent patterns in the data (unsupervised problem)? am i missing something?
any help would be greatly appreciated.