While doing joins on 2 tables using Beam SQL then it's working properly provide expected performance but as my Joining Tables increases then the performance become worst.
Below is my snippet which might help you to debug my Joining condition in Beam SQL to get better performance.
PCollection<Row> outputStream2 = PCollectionTuple.of(new TupleTag<>("corporation1"), sourceData)
.and(new TupleTag<>("dim"), dimtable).and(new TupleTag<>("place"), placeData)
.and(new TupleTag<>("principle"), principle).apply(SqlTransform.query(
"Select d.merchant,d.corporation1,d.place,d.principal,c.corporation1_sk,r.place_sk,p.principal_sk FROM dim d LEFT JOIN corporation1 c ON c.corporation1 = d.corporation1 LEFT JOIN place p ON p.place = d.place and c.corporation1 = p.corporation1 "));
Any better way I can do Joins on Beam SQL/ Apache Beam because Sequential Joins in Table,
Where the previous output is responsible for next table Joins. I have tried using Co-GroupBy and SideInput mixed approach also where data in Table is below 5K I took the SideInput and where data is above 50K used Co-GroupBy for Joining but the performance is not upto the mark.