0

I am new to SPSS modeler. I am triyng to create a simple data transformation with python on a dummy created data.

Flow

The dummy data is created as exected. (see at the bottom) I try to access and modify the data with python using the example that i found on IBM website

import spss.pyspark.runtime
from pyspark.sql.types import *

cxt = spss.pyspark.runtime.getContext() 

if  cxt.isComputeDataModelOnly():   
        _schema = cxt.getSparkInputSchema()   
        cxt.setSparkOutputSchema(_schema)
else:   
        _structType = cxt.getSparkInputSchema()
        df = cxt.getSparkInputData()   
        _newDF = df.sample(False, 0.01, 1)
        cxt.setSparkOutputData(_newDF)

When i try to press the preview to get see the result i got 2 errors: - Can not get data model: null - No record was received

enter image description here

(https://www.ibm.com/support/knowledgecenter/da/SS3RA7_18.0.0/modeler_r_nodes_ddita/clementine/r_pyspark_api_examples.html) dummy data

The whole setup looks like this enter image description here

Laca
  • 21
  • 5

2 Answers2

0

I'd like to comment, but have not enough reputation, so I have to ask using an answer.

Are you using the correct syntax tab? Extension Transform

Since when I use it like that, I'll get what I'd expect as the output. Output data


This code should just return your dataframe and print "Hello World" into the Console Output Tab:

import spss.pyspark.runtime
from pyspark.sql.types import *

cxt = spss.pyspark.runtime.getContext() 

if  cxt.isComputeDataModelOnly():   
        _schema = cxt.getSparkInputSchema()   
        cxt.setSparkOutputSchema(_schema)
else:   
        df = cxt.getSparkInputData()
        print("Hello World")
        cxt.setSparkOutputData(df)
pandayo
  • 310
  • 2
  • 13
  • Yes, I am using it. I am running the whole on my local computer, there is no server. – Laca Jun 26 '18 at 13:53
  • I am not sure, if the whole setup is not correct. if i simply enter a print 'test', than it gives the same error. – Laca Jun 26 '18 at 14:16
  • Hi @Laca, your setup should work, since I use the same. Can you append a screenshot of the inside of the extension transform node (syntax tab only)? Also could you try the code I edited into my answer? This should return the input dataframe and print "Hello World" to the Console Output Tab. – pandayo Jun 27 '18 at 06:39
  • Hi Panadayo! Thanks for your feedback. I added the whole setup to the original question. – Laca Jun 27 '18 at 08:34
  • I also tried the example with the Hello world, but it is the same error. – Laca Jun 27 '18 at 08:35
  • I also tryed to execute a small program in the execution window `print("Hello World") import spss.pyspark.runtime` then i got an error: `Hello World Error: AEQMJ0132E: Script cannot load module spss on line 2 column 1` So i guess the problem is comming from that the spss can not be found. – Laca Jun 27 '18 at 08:39
  • Is there a python folder in your SPSS installation folder? Which version of the modeler are you using? – pandayo Jun 27 '18 at 13:32
  • [My best guess is you are not fullfilling all of the requirements, e.g. making sure all users have permission to access the Python installation.](https://www.ibm.com/support/knowledgecenter/SS3RA7_18.0.0/modeler_r_nodes_ddita/clementine/r_intro_pyspark.html) – pandayo Jun 27 '18 at 13:41
  • Hi Panadayo. I have a python folder.I am using IBM SPSS Modeller Subscription 1.0, i downloaded it 3 days ago. It says no updates available. It can execute python codes in the Steam/execution window (however, it is not marked as "python for spark", but simply python). I also tried to run the whole modeler in administrator mode. – Laca Jun 28 '18 at 08:22
  • These are two different things, the stream execution window is able to use jython to set up and run the stream but not for data manipulation. – pandayo Jun 29 '18 at 09:09
  • Have you tried installing and using a python extension node from the extension node Store? – pandayo Jun 29 '18 at 09:10
  • Yes I tried, and it gives the same error. I think there is a problem with the python connection. It is simply not running any python for spark codes. But even after a full reinstall it is not working. But I can not see any error log, only the stupid "cannot get data model: nul" – Laca Jun 29 '18 at 13:25
  • I installed it on an other computer, and it works like chram !!! I still have no idea why is it not working on mine, after a fresh install, probably because i have other Python related programs installed (For ex: canopy). I leave the question open, maybe someone has some good ideas. – Laca Jun 29 '18 at 14:35
0

You can also try the use the legacy mode in the same script tab. I always use the legacy mode and the code it's similar to Clementine (old version of SPSS Modeler).

Ref from IBM

Érica Wong
  • 119
  • 6