0

I'm trying to find out association rules in my Market Basket Analysis by applying FP-Growth. My concern is to find association rules by Date, means finding out item associations on daily basis for up to a year. I can design to get associations for couple of days but it's time consuming to design it for 356 days. Dataset is as follows. enter image description here

I have used the Market basket analysis template given in Rapidminer. enter image description here

How can I achieve this by couple of step rather than executing for each day up to a year?

Thanks

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process" origin="GENERATED_TEMPLATE">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.0.002" expanded="true" height="68" name="Retrieve Clustered Data with Items" width="90" x="45" y="187">
        <parameter key="repository_entry" value="Clustered Data with Items"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="9.0.002" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="187">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="ReceiptDate.eq.01/05/2017"/>
        </list>
      </operator>
      <operator activated="true" class="aggregate" compatibility="6.0.006" expanded="true" height="82" name="Aggregate" origin="GENERATED_TEMPLATE" width="90" x="112" y="336">
        <list key="aggregation_attributes">
          <parameter key="Orders" value="sum"/>
        </list>
        <parameter key="group_by_attributes" value="Invoice|product 1"/>
      </operator>
      <operator activated="true" class="pivot" compatibility="9.0.002" expanded="true" height="82" name="Pivot" origin="GENERATED_TEMPLATE" width="90" x="246" y="336">
        <parameter key="group_attribute" value="Invoice"/>
        <parameter key="index_attribute" value="product 1"/>
      </operator>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.002" expanded="true" height="82" name="Rename by Replacing" origin="GENERATED_TEMPLATE" width="90" x="380" y="336">
        <parameter key="attribute" value="Invoice"/>
        <parameter key="replace_what" value="sum\(Orders\)_"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="9.0.002" expanded="true" height="103" name="Replace Missing Values" origin="GENERATED_TEMPLATE" width="90" x="112" y="442">
        <parameter key="default" value="zero"/>
        <list key="columns"/>
      </operator>
      <operator activated="true" class="numerical_to_binominal" compatibility="6.0.003" expanded="true" height="82" name="Numerical to Binominal" origin="GENERATED_TEMPLATE" width="90" x="246" y="442"/>
      <operator activated="true" class="set_role" compatibility="9.0.002" expanded="true" height="82" name="Set Role" origin="GENERATED_TEMPLATE" width="90" x="380" y="442">
        <parameter key="attribute_name" value="Invoice"/>
        <parameter key="target_role" value="id"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="concurrency:fp_growth" compatibility="9.0.002" expanded="true" height="82" name="FP-Growth" origin="GENERATED_TEMPLATE" width="90" x="648" y="289">
        <parameter key="positive_value" value="true"/>
        <parameter key="min_support" value="0.005"/>
        <parameter key="find_min_number_of_itemsets" value="false"/>
        <enumeration key="must_contain_list"/>
      </operator>
      <operator activated="true" class="create_association_rules" compatibility="9.0.002" expanded="true" height="82" name="Create Association Rules" origin="GENERATED_TEMPLATE" width="90" x="648" y="442">
        <parameter key="min_confidence" value="0.1"/>
      </operator>
      <connect from_op="Retrieve Clustered Data with Items" from_port="output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_op="Pivot" to_port="example set input"/>
      <connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
      <connect from_op="Rename by Replacing" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
      <connect from_op="Numerical to Binominal" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
      <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
      <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
      <connect from_op="Create Association Rules" from_port="item sets" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="147"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="42"/>
      <description align="left" color="yellow" colored="false" height="70" resized="false" width="850" x="20" y="25">MARKET BASKET ANALYSIS&lt;br&gt;Model associations between products by determining sets of items frequently purchased together and building association rules to derive recommendations.</description>
      <description align="left" color="blue" colored="true" height="185" resized="true" width="550" x="20" y="105">Step 1:&lt;br/&gt;Load transaction data containing a transaction id, a product id and a quantifier. The data denotes how many times a certain product has been purchased as part of a transactions.</description>
      <description align="left" color="purple" colored="true" height="341" resized="true" width="549" x="20" y="300">&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; Step 2:&lt;br&gt;Edit, transform &amp;amp; load (ETL) - Aggregate transaction data to account for multiple occurrences of the same product in a transaction. Pivot the data so that each transaction is represented by a row. Transform purchase amounts to binary &amp;quot;product purchased yes/no &amp;quot; indicators.&lt;br&gt;</description>
      <description align="left" color="green" colored="true" height="310" resized="true" width="290" x="580" y="105">Step 3:&lt;br/&gt;Using FP-Growth, determine frequent item sets. A frequent item sets denotes that the items (products) in the set have been purchased together frequently, i.e. in a certain ratio of transactions. This ratio is given by the support of the item set.</description>
      <description align="left" color="green" colored="true" height="215" resized="true" width="286" x="579" y="425">&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; Step 4:&lt;br/&gt;Create association rules which can be used for product recommendations depending on the confidences of the rules.&lt;br&gt;</description>
      <description align="left" color="yellow" colored="false" height="35" resized="true" width="849" x="20" y="655">Outputs: association rules, frequent item set&lt;br&gt;</description>
    </process>
  </operator>
</process>

Sample Data

user1220497
  • 311
  • 2
  • 4
  • 15

1 Answers1

0

After retrieving your data as an example set, you can then use the Loop Values operator to loop over the ReceiptDate attribute. The current value (in your case, the date) is stored in the loop_value macro.

You then put the whole process of building your association rules inside the subprocess and change your Filter Examples operator to condition class expression with ReceiptDate==%{loop_value} as the parameter expression.

This will filter your whole dataset, so you only keep examples of the current date, then build your model on that subset. As a result, you get a collection of different models on the out-port of Loop Values.

If you frequently build different models based on one (or more) parameter, it might be interesting to have a look at the Jackhammer extension by Old World Computing - the Indexed Model operator does exactly this for you (build specific models for different parameter values). It makes using these specific models a no-brainer, since you get one single model that you can then apply on your data - the model matching the parameters is automatically selected and applied.

Christian König
  • 3,437
  • 16
  • 28