Questions tagged [windowing]

In signal processing windowing allows to work on overlapping segments of the signal.

224 questions
4
votes
3 answers

Clean way to identify runs on a PySpark DF ArrayType column

Given a PySpark DataFrame of the form: +----+--------+ |time|messages| +----+--------+ | t01| [m1]| | t03|[m1, m2]| | t04| [m2]| | t06| [m3]| | t07|[m3, m1]| | t08| [m1]| | t11| [m2]| | t13|[m2, m4]| | t15| [m2]| | t20| [m4]| |…
Jedi
  • 3,088
  • 2
  • 28
  • 47
4
votes
1 answer

Flink Tumble Window Trigger time

I am using Flink to aggregate the data from kafka topics. I am using a tumble window of 1 hour, with the time characteristic set to Event Time. I am also using AscendingTimestampExtractor and assigning watermarks to the input based on a particular…
4
votes
1 answer

Apache Flink: Watermarks, Dropping Late Events, and Allowed Lateness

I am having trouble understanding the concept of watermarks and allowed lateness. Following is an excerpt from the [mail archive|https://www.mail-archive.com/user@flink.apache.org/msg08758.html] that talks about Watermarks but I have a couple of…
Sheel Pancholi
  • 621
  • 11
  • 25
4
votes
2 answers

SQL - Flattening a table using windowing and case statements

First time asker -- I'm having some problems combining case logic and windowing in SqlServer 2012. I need to flatten the data structure shown below, so I'll be running MAX statements against these results afterwards. I'm using case/when logic to…
4
votes
1 answer

How to use a context window to segment a whole log Mel-spectrogram (ensuring the same number of segments for all the audios)?

I have several audios with different duration. So I don't know how to ensure the same number N of segments of the audio. I'm trying to implement an existing paper, so it's said that first a Log Mel-Spectrogram is performed in the whole audio with 64…
4
votes
3 answers

sql windowing to persist a record on a given condition

I have some data around a website where the website has different shop sections but when the user checks out at the end, we only know what shop section it is by looking for their most recent section hit For example if I have data that looks…
shecode
  • 1,716
  • 6
  • 32
  • 50
3
votes
2 answers

Apache beam windowing: consider late data but emit only one pane

I would like to emit a single pane when the watermark reaches x minutes past the end of the window. This let's me ensure I handle some late data, but still only emit one pane. I am currently working in java. At the moment I can't find proper…
Joe Stoker
  • 157
  • 11
3
votes
2 answers

how to find number of active users for say 1 day,2 days, 3 days.....postgreSQL

A distribution of # days active within a week: I am trying to find how many members are active for 1 day, 2days, 3days,…7days during a specific week 3/1-3/7. Is there any way to use aggregate function on top of partition by? If not what can be used…
KK44
  • 31
  • 1
3
votes
1 answer

Find the first index for which an array goes below a certain threshold (and stay below for some time)

Let A be a 1D numpy array, a threshold t, and a window length K. How to find the minimal index j, such that A[j:j+K] < t? (i.e. the first time A stays below the threshold on a full window of width K). I've tried (unfinished) things with a loop, but…
Basj
  • 41,386
  • 99
  • 383
  • 673
3
votes
0 answers

Apache Beam: Custom Windowing (windowfn)

Gurus - I am new to Apache Beam and trying to implement, what seems to be a pretty straight forward use case. I have stock data and I need to find a rolling average-price of the stock over the past 10 transactions. Now since there is no fixed time…
hpep
  • 37
  • 3
3
votes
1 answer

Apache Flink: Skewed data distribution on KeyedStream

I have this Java code in Flink: env.setParallelism(6); //Read from Kafka topic with 12 partitions DataStream line = env.addSource(myConsumer); //Filter half of the records DataStream> line_Num_Odd =…
3
votes
1 answer

TSQL Replacing "Quirky Update" Calculations with Windowing and CTE

I am trying to come up with an alternative to using the "Quirky Update" using windowing and CTE for a somewhat complex calculation of "running performance". The quick math for running performance is ((1 + Running) * (1 + Daily)) - 1. This running…
3
votes
5 answers

How to get a negative rownumber in sql server

I have a set of data with a DateTime, say CalculatedOn what I would like is to get start at the current date getdate() and get an x amount of records from before the current date, and the same amount from after. If x = 50 then 50 prior to now and 50…
sprocket12
  • 5,368
  • 18
  • 64
  • 133
2
votes
1 answer

Rolling window on timestamped DataFrame with a custom step?

I have been fiddling about with pandas.DataFrame.rolling for some time now and I haven't been able to achieve the result that I am looking for, so before I write a custom windowing function I figured I would ask if I'm missing something. I have…
abscond
  • 133
  • 6
2
votes
1 answer

StructuredStreaming withWatermark - TypeError: 'module' object is not callable

I have a Structured Streaming pyspark program running on GCP Dataproc, which reads data from Kafka, and does some data massaging, and aggregation. I'm trying to use withWatermark(), and it is giving error. Here is the code : df_stream =…
1
2
3
14 15