0

I have two delta tables I'm reading, joining and writing. They both have timestamps, so I'm using those as watermarks and I can join the data without problems. However, when I try to group it, the stream doesn't write anything to the delta anymore.

actionResult = actionResult.withColumnRenamed("timestamp", "secTimestamp") \
  .withWatermark("secTimestamp", "1 day")
combi = action.join(actionResult, 
          (action.actionID == actionResult.actionID) & 
           expr("secTimestamp < timestamp + interval 1 day"), 
        how="left") \ 
  .drop("secTimestamp").drop(actionResult.actionID) # still able to write

combi = combi.withWatermark("timestamp", "1 day") \
  .groupby("ShipmentID", F.window("timestamp", "1 day", "1 day")) \
  .agg(sparkMax(col('timestamp')).alias("timestamp"),
    collect_list('HubID').alias('HubID')) # no results

Any insights on what might be wrong?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Stefan
  • 2,098
  • 2
  • 18
  • 29
  • No sure, but on 1st glance your .agg at end might be leaking with brackets? Not sure, but might. – pinegulf Sep 20 '21 at 07:40
  • @pinegulf thanks for your comment! But I think they are correct, both collect_list and sparkMax(col('timestamp')) should be inside the agg and the agg is closed at the end. – Stefan Sep 20 '21 at 07:44

0 Answers0