0

I successfully apply a neural net operator in rapidminer on a data set in which I have 3 columns and the 4th one the labelled one

column1|column2|column3|column4(labelled)
data   |data   |data   |data  

, now I have a testing data in order to predict the value of labelled column based upon the column1, column2, column3, testing data looks like:

column1|column2|column3
data   |data   |data   

Question: is this correct?

Using this approach, I created a model so that the process can predict value of unlabelled column:

With out splitting data

Then, using the solution in the below reference :

Split data solution

I again created a model using split data, for this I combined my data set for training and testing (now the combined data has some values for labelled column and some does not have this column value as this is the part of testing data).

With Splitting data

But still I am getting this error.

KeenLearner
  • 685
  • 1
  • 8
  • 25

1 Answers1

0

from what I can see the problem is, that you don't apply the Nominal to Numerical operator to your test set. In the default settings, this operator creates a dummy encoding for each nominal value found in the specified attribute. In your case you will have a column/attribute named "Course1=A" with a 1 as entry for each example where the original column was "A" and so on.

What you need to do is to apply the same encoding to your test data as to your training data. As you can see, the Nominal to Numerical operator has an additional output port called pre (short for preprocessing model). This can be used apply the same pre-processing steps (like normalization or encoding) on multiple data sets.

For convince you can also also group several models into one by using the Group Model operator.

See the process XML below (just c&p it into the process view of RapidMiner) for an example.

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
  <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
  </operator>
  <operator activated="true" class="nominal_to_numerical" compatibility="8.2.000" expanded="true" height="103" name="Nominal to Numerical" width="90" x="179" y="34">
    <list key="comparison_groups"/>
    <description align="center" color="purple" colored="true" width="126">Transform the nominal attributes into a dummy encoding with 0/1 for each expression.&lt;br&gt;This encoding is then also delivered via &amp;quot;pre&amp;quot; output port.</description>
  </operator>
  <operator activated="true" class="neural_net" compatibility="8.2.000" expanded="true" height="82" name="Neural Net" width="90" x="447" y="34">
    <list key="hidden_layers"/>
  </operator>
  <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Golf-Testset" width="90" x="45" y="340">
    <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
  </operator>
  <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="447" y="340">
    <list key="application_parameters"/>
  </operator>
  <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="648" y="340">
    <list key="application_parameters"/>
  </operator>
  <connect from_op="Retrieve Golf" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
  <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Neural Net" to_port="training set"/>
  <connect from_op="Nominal to Numerical" from_port="preprocessing model" to_op="Apply Model (2)" to_port="model"/>
  <connect from_op="Neural Net" from_port="model" to_op="Apply Model" to_port="model"/>
  <connect from_op="Retrieve Golf-Testset" from_port="output" to_op="Apply Model (2)" to_port="unlabelled data"/>
  <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Apply Model" to_port="unlabelled data"/>
  <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
  <portSpacing port="source_input 1" spacing="0"/>
  <portSpacing port="sink_result 1" spacing="0"/>
  <portSpacing port="sink_result 2" spacing="0"/>
  <description align="center" color="green" colored="true" height="103" resized="true" width="315" x="433" y="433">First apply the &amp;quot;preprocessing&amp;quot; model so the test data have the same structure&lt;br/&gt;&lt;br/&gt;Then apply the trained neural net</description>
</process>
</operator>
</process>

enter image description here

Also feel free to ask further, or re-post, questions in the RapidMiner community forum.

David
  • 792
  • 5
  • 17