In RapidMiner Studio 9.5.1, after my python script completes, I can print the resulting dataframe and see that it is produced as expected with the proper columns. The rapidminer processor yet fails with the message:
Exception: com.rapidminer.operator.OperatorException
Message: Script terminated abnormally: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Stack trace:
com.rapidminer.extension.pythonscripting.operator.scripting.AbstractScriptRunner.run(AbstractScriptRunner.java:137)
com.rapidminer.extension.pythonscripting.operator.scripting.AbstractScriptingLanguageOperator.doWork(AbstractScriptingLanguageOperator.java:210)
com.rapidminer.extension.pythonscripting.operator.scripting.python.PythonScriptingOperator.doWork(PythonScriptingOperator.java:434)
com.rapidminer.operator.Operator.execute(Operator.java:1032)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:77)
com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:812)
com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:807)
java.security.AccessController.doPrivileged(Native Method)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:807)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:423)
com.rapidminer.operator.Operator.execute(Operator.java:1032)
com.rapidminer.Process.executeRoot(Process.java:1378)
com.rapidminer.Process.lambda$executeRootInPool$5(Process.java:1357)
com.rapidminer.studio.concurrency.internal.AbstractConcurrencyContext$AdaptedCallable.exec(AbstractConcurrencyContext.java:328)
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
without providing any other insight nor referencing a line within my code in the script. I have updated the numpy library in case it was a compatibility problem with older versions but still no solution.
numpy 1.14.5 pypi_0 pypi
numpy-base 1.16.4 py36hc3f5095_0 defaults
numpydoc 0.9.1 py_0 defaults
pandas 0.25.3 py36ha925a31_0 defaults
Also, when checking if the python environment is ok (Anaconda env), from the Settings>Preferences>Python Scripting in RapidMiner, all tests pass with success.
The processor xml from the .rmp file is:
<operator activated="true" class="python_scripting:execute_python" compatibility="9.5.000" expanded="true" height="103" name="Execute Python" width="90" x="313" y="34">
<parameter key="script" value="import pandas # rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) def rm_main(data): print('Hello, world!') # output can be found in Log View print(type(data)) #your code goes here #for example: data2 = pandas.DataFrame([3,5,77,8]) # connect 2 output ports to see the results return data, data2"/>
<parameter key="script_file" value="%{ResourcePath}\detect_aggressive_language.py"/>
<parameter key="notebook_cell_tag_filter" value=""/>
<parameter key="use_default_python" value="true"/>
<parameter key="package_manager" value="conda (anaconda)"/>
<description align="center" color="transparent" colored="false" width="126">Detect Script</description>
</operator>
Up to now, I have tried:
1. Update the initial DataFrame (data) with my computed columns and return it.
2. Create a new DataFrame with my columns and return that either alone or as second argument after data.
3. Create a method (within the script) that accepts the initial DataFrame data as argument, modified it, and then return it.
4. Pickle the new DataFrame, save it, load it and return it.
All these tries resulted in the same error presented above.
My guessing is that RapidMiner makes some kind of check upon the processor's completion that uses the code which produces the error above, so it fails and the processor terminates.
Is there a special proper way to handle and return DataFrames in RapidMiner to bypass the error, or is there anything else I could examine for finding out where the problem lies?