It doesn't work because you're applying a SqlTransform
to a pipeline, not a PCollection
.
You probably want to change it along these lines:
// source probably returns a PCollection,
// would make sense to change 'it' to PCollection:
PCollection<...> it = p.apply(new SampleSource());
// then apply SqlTransform to the PCollection from the previous step,
// that is apply it directly to 'it':
it.apply(SqlTransform.query(sql1));
...
How Beam pipeline works, from high level perspective:
- create a pipeline;
- apply an IO
PTransform
that reads from some source and produces a PColelction
of some elements that it reads from the source;
- chain-apply more
PTransforms
to the PCollection
from the previous step to process the data (conceptually, different PCollections
will be produced at each step);
- repeat;
SqlTransform
is a normal PTransform
, it is expected to be applied to a PCollection
of elements and output another PCollection
as a result. The query that you specify in SqlTransform.create()
is applied to a PCollection
. It expects the data to come from a magical PCOLLECTION
table that represents the PCollection
that you apply the SqlTransform
to.
What you are doing in your example is different:
- create a pipeline;
- apply a source
PTransform
that produces a POutput
not necessarily a PCollection
;
- then you ignore the output if your source, but instead take the original pipeline and apply a
SqlTransform
directly to it;
So what happens is that SqlTransform
in this case is applied to the 'root' of the pipeline, not to the PCollection
that comes out of the source. Instead of chain of PTransforms
applied one after another you now have two PTransforms
applied to the root independently from each other.
One more caveat is that SqlTransform
expects input elements to be Rows
, because SQL as a language works only on data that is represented as rows. There are two ways to achieve this:
- manually convert the elements that are produced by the source to
Rows
by applying another ParDo
between the source and SqlTransform
;
- use Beam's
Schema
framework (e.g. check out PCollection.setSchema()
method) that allows Beam SQL to automatically convert input elements into Rows
;