0

I would like to extract data using windows function on apache beam, by day timeframe. Which I worked on python and used FixedWindow for capture the data.

And I had problem about consistency of data cause this code is working by count duration timestamp,

beam.WindowInto(window.FixedWindows(1440*60)) # minute of whole day * second

So that mean that if I start beam pipeline at June-3 3:00PM, It's will be end at June-4 3:00PM.

I want something like, If I have to start the pipeline at June-3 3:00PM, When the time has arrivel to June-4 0:00AM,
The windows function should start new capture, After June-3 11:59:59 PM

so anyone have idea? or the windows function didn't has supported kind of this work.

zzob
  • 993
  • 3
  • 9
  • 19

1 Answers1

1

The windows are not based on the start time of the pipeline, they are based off of the Unix epoch.

In your case, if you want the windows to be aligned days, you can use CalendarWindows. You'll just need to specify the time zone in which the days should be measured.

danielm
  • 3,000
  • 10
  • 15
  • In that case, you can approximate this with FixedWindows. It has an offset parameter that lets you adjust when the windows start. You'll still end up with a bit of an issue because not all days are the same length, due to daylight savings and leap seconds. – danielm Jun 05 '20 at 20:32