0

I use the official Recommendation as a test. I did these steps successfully:

  1. event server installed in a docker container.(successfully)
  2. config eventdata, metadata and all things are stored in mysql.(successfully)
  3. train & deploy server in another docker container.(successfully)
  4. spark standalone cluster installed in another container.(successfully)
  5. create new app.(successfully)
  6. import enough eventdata.(successfully)

When I train and deploy as follows, it's ok as the docs described :

pio train
pio deploy

But when I use spark cluster, train and deploy as follows, train is ok(new model has been stored in mysql), but deploy isn't success.

pio train -v engine.json -- --master spark://predictionspark:7077 --executor-memory 2G --driver-memory 2G --total-executor-cores 1
pio deploy -v engine.json --feedback --event-server-ip predictionevent --event-server-port 7070 --accesskey Th7k5gE5yEu9ZdTdM6KdAj0InDrLNJQ1U3qEBy7dbMnYgTxWx5ALNAa2hKjqaHSK -- --master spark://predictionspark:7077 --executor-memory 2G --driver-memory 2G --total-executor-cores 1

All log:

[INFO] [Runner$] Submission command: /spark/bin/spark-submit --master spark://predictionspark:7077 --executor-memory 2G --driver-memory 2G --total-executor-cores 1 --class org.apache.predictionio.workflow.CreateWorkflow --jars file:/PredictionIO/lib/mysql-connector-java-5.1.46.jar,file:/ebsa/app/cf/target/scala-2.11/template-scala-parallel-recommendation_2.11-0.1-SNAPSHOT.jar,file:/ebsa/app/cf/target/scala-2.11/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar,file:/PredictionIO/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-hbase-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-jdbc-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-localfs-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-s3-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-hdfs-assembly-0.12.1.jar --files file:/PredictionIO/conf/log4j.properties --driver-class-path /PredictionIO/conf:/PredictionIO/lib/mysql-connector-java-5.1.46.jar --driver-java-options -Dpio.log.dir=/root file:/PredictionIO/lib/pio-assembly-0.12.1.jar --engine-id org.example.recommendation.RecommendationEngine --engine-version 0387c097c02018fa29109a8990b03d163249be00 --engine-variant file:/ebsa/app/cf/engine.json --verbosity 0 --json-extractor Both --env PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_MYSQL_PASSWORD=***,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/root/.pio_store,PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://***:3306/predictionio,PIO_HOME=/PredictionIO,PIO_FS_ENGINESDIR=/root/.pio_store/engines,PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_MYSQL_USERNAME=***,PIO_FS_TMPDIR=/root/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL,PIO_CONF_DIR=/PredictionIO/conf
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(cf,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [log] Logging initialized @6069ms
[INFO] [Server] jetty-9.3.z-SNAPSHOT
[INFO] [Server] Started @6184ms
[WARN] [Utils] Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[INFO] [AbstractConnector] Started ServerConnector@2b53840a{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@422ad5e2{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1b3ab4f9{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1c8f6c66{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@151732fb{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40ed1802{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@feb098f{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@31e739bf{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7f42e06e{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2efd2f21{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@316cda31{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@17d2b075{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@310b2b6f{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@6b5ab2f2{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@6b2dd3df{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@73c48264{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5bcec67e{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a2fce12{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4bb1b96b{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@1f66d8e1{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3421debd{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@68b7d0ef{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@319642db{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@35bfa1bb{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2eda4eeb{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@309dcdf3{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@6a2d867d{/metrics/json,null,AVAILABLE,@Spark}
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: org.example.recommendation.DataSource@5db3d57c
[INFO] [Engine$] Preparator: org.example.recommendation.Preparator@395f52ed
[INFO] [Engine$] AlgorithmList: List(org.example.recommendation.ALSAlgorithm@26e0d39c)
[INFO] [Engine$] Data sanity check is on.
[INFO] [Engine$] org.example.recommendation.TrainingData does not support data sanity check. Skipping check.
[INFO] [Engine$] org.example.recommendation.PreparedData does not support data sanity check. Skipping check.
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
[INFO] [Engine$] org.apache.spark.mllib.recommendation.ALSModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=0ac606dc-9959-40f8-9f40-d32354ebf221
[WARN] [TaskSetManager] Stage 1403 contains a task of very large size (1217 KB). The maximum recommended task size is 100 KB.
[WARN] [TaskSetManager] Stage 1404 contains a task of very large size (1767 KB). The maximum recommended task size is 100 KB.
[INFO] [CoreWorkflow$] Inserting persistent model
[INFO] [CoreWorkflow$] Updating engine instance
[INFO] [CoreWorkflow$] Training completed successfully.
[INFO] [AbstractConnector] Stopped Spark@2b53840a{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
[INFO] [Runner$] Submission command: /spark/bin/spark-submit --master spark://predictionspark:7077 --executor-memory 2G --driver-memory 2G --total-executor-cores 1 --class org.apache.predictionio.workflow.CreateServer --jars file:/PredictionIO/lib/mysql-connector-java-5.1.46.jar,file:/ebsa/app/cf/target/scala-2.11/template-scala-parallel-recommendation_2.11-0.1-SNAPSHOT.jar,file:/ebsa/app/cf/target/scala-2.11/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar,file:/PredictionIO/lib/spark/pio-data-elasticsearch-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-hbase-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-jdbc-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-localfs-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-s3-assembly-0.12.1.jar,file:/PredictionIO/lib/spark/pio-data-hdfs-assembly-0.12.1.jar --files file:/PredictionIO/conf/log4j.properties --driver-class-path /PredictionIO/conf:/PredictionIO/lib/mysql-connector-java-5.1.46.jar --driver-java-options -Dpio.log.dir=/root file:/PredictionIO/lib/pio-assembly-0.12.1.jar --engineInstanceId 0ac606dc-9959-40f8-9f40-d32354ebf221 --engine-variant file:/ebsa/app/cf/engine.json --ip 0.0.0.0 --port 8000 --event-server-ip predictionevent --event-server-port 7070 --accesskey Th7k5gE5yEu9ZdTdM6KdAj0InDrLNJQ1U3qEBy7dbMnYgTxWx5ALNAa2hKjqaHSK --feedback --json-extractor Both --env PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_MYSQL_PASSWORD=***,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/root/.pio_store,PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://***:3306/predictionio,PIO_HOME=/PredictionIO,PIO_FS_ENGINESDIR=/root/.pio_store/engines,PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_MYSQL_USERNAME=***,PIO_FS_TMPDIR=/root/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL,PIO_CONF_DIR=/PredictionIO/conf
[WARN] [WorkflowUtils$] Non-empty parameters supplied to org.example.recommendation.Preparator, but its constructor does not accept any arguments. Stubbing with empty parameters.
[WARN] [WorkflowUtils$] Non-empty parameters supplied to org.example.recommendation.Serving, but its constructor does not accept any arguments. Stubbing with empty parameters.
[INFO] [log] Logging initialized @6953ms
[INFO] [Server] jetty-9.3.z-SNAPSHOT
[INFO] [Server] Started @7086ms
[WARN] [Utils] Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[INFO] [AbstractConnector] Started ServerConnector@d8ed4d9{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@307b5956{/jobs,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@62d50094{/jobs/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@8a644df{/jobs/job,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5a9054e7{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@15402e55{/stages,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@295b1de5{/stages/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@e7ac843{/stages/stage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@da15f73{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@506a8fc2{/stages/pool,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@fc4cf4d{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@255cef05{/storage,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e8f6bce{/storage/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4e6427d4{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@5fca5109{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@2acbd47f{/environment,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@39004878{/environment/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@785b7109{/executors,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@f0fce80{/executors/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@19ab67fc{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@644558a3{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@40fa6a20{/static,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@238b2adb{/,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4bbba0ce{/api,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@3d1e4c06{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@70f8bf47{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@558311ee{/metrics/json,null,AVAILABLE,@Spark}
[INFO] [Engine] Using persisted model
[INFO] [Engine] Custom-persisted model detected for algorithm org.example.recommendation.ALSAlgorithm
[ERROR] [OneForOneStrategy] empty collection

I don't know why.

More: Instead of using a standalone cluster installed on another docker container, I start a local cluster with the spark on the same container with train&deploy server(@user2906838 mentioned), it's successfully. I don't understand why it happened. I can't use a local spark, it's very strange.


More:

the /tmp folder, in two different situations, the file size is different.

success image failed image

More:

It's interesting. I find the model datas at the spark-worker container.

spark-worker image

  • can you please provide more logs, please give the full stacktrace of the deploy console. Also you can see this : https://medium.freecodecamp.org/building-an-recommendation-engine-with-apache-prediction-io-ml-server-aed0319e0d8 if helps. I've provided the spark example as well. – user2906838 May 10 '18 at 10:59
  • @user2906838 I have pasted all the logs. I look forward to your reply, thanks. – user8036017 May 10 '18 at 15:04
  • I think, there is something wrong with the train itself, are you sure you have the right permission for wherever the model is being saved while training. [ERROR] [OneForOneStrategy] empty collection , this doesn't tell much but as far as my experience with it's concerned, I guess there is something wrong while saving the model. – user2906838 May 10 '18 at 15:30
  • @user2906838 I config the model stored in mysql. And after training, I can find a new model data being sotred in the table `pio_model_models`. – user8036017 May 10 '18 at 15:47
  • in that case, not a solution to your current problem but why don't you try the plain installation, i.e without using docker, and as much as I used to know the models are actually stored in a file, in a specified directory. I've had some problems with the pip deploy as well, and in my case simply the model saved in /tmp folder was creating the issue with improper permission. – user2906838 May 10 '18 at 15:53
  • @user2906838 your are right! I checked the /tmp folder, there's not enough *.crs datas after training, just `_SUCCESS` and `._SUCCESS.crs`. So, I should focus on the /tmp folder. But run the command as root, the issue with improper permission? – user8036017 May 11 '18 at 00:32
  • the pio shouldn't be run with sudo, I guess the documentation already tells that somewhere. – user2906838 May 11 '18 at 00:40
  • @user2906838 I find the model datas at the spark-worker container. Please look at the last image. I am a beginner with spark. I guess the pio uses the spark results on HDFS? If I install the spark standalone cluster in local container, the results will be write on local /tmp, so the deploying can get the datas. If I install the spark on remote container, it wouldn't. It's all a beginner's guess. – user8036017 May 11 '18 at 06:26
  • the second image is right, yes your guess could be true. To validate, I would suggest you to do a plain standalone installation first from the source and see. Even the debugging is easy that way. – user2906838 May 11 '18 at 06:59
  • Since Spark uses the HDFS API to access files, if you configure HADOOP_CONF_DIR to point to the config files of your Hadoop cluster, Spark will attempt to access files on your HDFS if you leave out the scheme and host name in the URI, i.e. hdfs://predictionspark:9000/ in your case. If HADOOP_CONF_DIR is not configured, Spark will instead attempt to access files locally. – user8036017 May 22 '18 at 10:04

1 Answers1

0

step-deploy depends on step-train's datas written at file://tmp or hdfs://tmp.