1

I am trying to get all the changes over a different period of time. My issue is that I have row where the values are the same than one before. I am using Azure Dataflows (SQL or SPARK answer will do it). In my example you can see that the hash value aaa repeats itself, so when i am doing a window over nkHash and hash, it puts it together and im losing the change.

Here is an example:

nkHash hash createdDate
A aaa 1
A aaa 2
A bbb 3
A aaa 4
A ccc 5

My final answer should look like this

nkHash hash createdDate
A aaa 1
A bbb 3
A aaa 4
A ccc 5
Justin
  • 91
  • 8

1 Answers1

2

This is basically just a gaps and islands problem, except you just want to keep the islands where the row num = 1.

The following query was taken from this question - sql window function every time partionion value changes

This was a competing answer to mine that got selected. If you don't like this answer you can try mine as well. Essentially what we're doing is we are using the lag function to check the previous value. Every time the value changes we count that as a flip. All records with the same number of flip are part of the same group and restart the row numbering (which we are partitioning on).

Finally, we just add an extra query at the end to get the row number = 1.

with u as 
(
   select nkHash, hash, createdDate, lag(hash) over (order by createdDate) as previousi
   from sample
),
v as
(
  select nkHash, hash, createdDate, 
    sum(case when hash = previousi then 0 else 1 end)
    over (order by createdDate rows unbounded preceding) as flips
  from u
),
x as
(
select nkHash, hash, createdDate, row_number() over (partition by flips order by createdDate) as rn
from v
)
select nkHash, hash, createdDate
from x
where rn = 1

I have a sqlfiddle you can use to validate the answer: http://sqlfiddle.com/#!18/bcb103/6/0

Also, have a screenshot of the results. enter image description here

DonkeyKongII
  • 431
  • 2
  • 8