I am looking for a way to implement a SparkCompute (or SparkSink) plugin that consumes from multiple inputs.
Looking at the interface, both SparkCompute
and SparkSink
plugins are limited to consume only one.
This is an excerpt from io.cdap.cdap.etl.api.batch.SparkCompute
/**
* Transform the input and return the output to be sent to the next stage in the pipeline.
*
* @param context {@link SparkExecutionPluginContext} for this job
* @param input input data to be transformed
* @throws Exception if there is an error during this method invocation
*/
public abstract JavaRDD<OUT> transform(SparkExecutionPluginContext context, JavaRDD<IN> input) throws Exception;
(only one JavaRDD<IN>
parameter is in the method signature)
Is there any way to access all the inputs (via SparkExecutionPluginContext context
or something similar)?