4

Given a query that looks like this:

SELECT
    EventDate,
    system.Timestamp as test
INTO
    [azuretableoutput]
FROM
    [csvdata] TIMESTAMP BY EventDate

According to documentation, EventDate should now be used as timestamp. However, when storing data into blobstorage with this path:

sadata/Y={datetime:yyyy}/M={datetime:MM}/D={datetime:dd}

I seem to still get ingested time. In my case, ingested time means nothing and I need to use EventDate for the path. Is this possible?

When checking data in Visual Studio, test and EventDate should be equal, however results look like this:

EventDate                   ;Test
2020-04-03T11:13:07.3670000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:07.0460000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:07.0460000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:07.3670000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:08.1470000Z;2020-04-09T02:16:15.5390000Z

Late tollerance arrival window is set as: 99:23:59:59 Out of order tollerance is set as: 00:00:00:00 with out of order action set to adjust.

When running same query in Stream Analytics on Azure i get this result:

[{"eventdate":"2020-04-03T11:13:20.1060000Z","test":"2020-04-03T11:13:20.1060000Z"},
{"eventdate":"2020-04-03T11:13:20.1060000Z","test":"2020-04-03T11:13:20.1060000Z"},
{"eventdate":"2020-04-03T11:13:20.1060000Z","test":"2020-04-03T11:13:20.1060000Z"}] 

So far so good. When running the query with data on Azure it produces this path:

 Y=2020/M=04/D=09

It should have produced this path: Y=2020/M=04/D=03 Interestingly enough, when checking the data that is actually stored in blobstorage I find this:

EventDate,test
2020-04-03T11:20:39.3100000Z,2020-04-09T19:33:35.3870000Z,

System.timestamp seems to only be altered when testing the query on sampled data, but is not actually altered when the query is running normally and receiving data.

I have tested this with late arrival setting set to 0 and 20 days. In reality I need to disable late arrival adjustment as I might get events that are years old through the pipeline.

ruffen
  • 1,695
  • 2
  • 25
  • 51
  • Do you get the desired timestamp when you test the query or when your blob output does not use a path? – Lenna Apr 07 '20 at 05:06
  • The blob always outputs the path as current date, even if the eventdate timestamp is 2-3 days prior. So I check it at output and I test it through Visual Studio. – ruffen Apr 07 '20 at 08:58
  • The value of "System.Timestamp" is used for the blob output path. "System.Timestamp" is normally assigned using the value in "TIMESTAMP BY " however, due to out of order threshold and late arrival threshold, this can be different. You can select "System.Timestamp" as a column to confirm the behavior. – Lenna Apr 08 '20 at 16:45

1 Answers1

1

This issue has been brought up and closed on the MicrosoftDocs GitHub

The Microsoft folks say:

Maximum days for late arrival is 20, so if the policy is set to 99:23:59:59 (99 days). The adjustment could be causing a discrepancy in System.Timestamp.

By definition of late arrival tolerance window, for each incoming event, Azure Stream Analytics compares the event time with the arrival time; if the event time is outside of the tolerance window, you can configure the system to either drop the event or adjust the event’s time to be within the tolerance.

Consider that after watermarks are generated, the service can potentially receive events with event time lower than the watermark. You can configure the service to either drop those events, or adjust the event’s time to the watermark value.

As a part of the adjustment, the event’s System.Timestamp is set to the new value, but the event time field itself is not changed. This adjustment is the only situation where an event’s System.Timestamp can be different from the value in the event time field, and may cause unexpected results to be generated.

For more information, please see Understand time handling in Azure Stream Analytics.

Unfortunately, testing with sample data in Azure portal doesn't take policies into account at this time.

Potentially other helpful resources:

Lenna
  • 1,220
  • 6
  • 22
  • 1
    I added some more information to the question, it seems like the system.timestamp property is not altered at all, even with TIMESTAMP BY. – ruffen Apr 09 '20 at 16:57
  • After looking closer, the system.timestamp property is altered when running query on Azure Portal, however it does not actually produce the correct path. – ruffen Apr 09 '20 at 19:40
  • I am working on trying to reproduce your situation, but I will likely miss the bounty deadline. For me the path is adjusting perfectly but I have not tried messing with settings so I'm not sure what you might be doing differently. I am finishing up the last semester of my senior year rn and my university is piling on the work haha! Will try to solve this for you asap tho! – Lenna Apr 12 '20 at 22:13
  • I have no issues adding bounty again if i can get this revolved. Are you testing this in vs or in portal? And can you share the exact query and path settings you are testing with that is working and test data. Are you adjusting by more than one day as well? Also I am recieving sensor data that can be up to 3 years old, this means i have to go outside late arrival threshold, or turn it off, which i am not sure is possible. – ruffen Apr 12 '20 at 22:20
  • I am using Visual Studio, and I am not currently using test cases that generate 3 year old data (I am playing with only a few days) so I'll switch that up and make better tests. I have a theory about your issues though... It may have to do with section 5 of [Side effects of event ordering time tolerances](https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-time-handling). "**System.Timestamp** value is different from the time in the **event time** field." I fear **event time** might be changing the path? – Lenna Apr 13 '20 at 15:56
  • I am trying to figure out the relationship between **System.Timestamp**, **event time**, the mystery of what azure decides it should change due to event ordering nonsense and how all of it changes the **blob output path**. I poked at it this morning with not much progress, will switch gears to actual schoolwork and come back to this later. – Lenna Apr 13 '20 at 16:00
  • My theory is that when you change system.timestamp Azure decides to compare this timestamp to injected timestamp. If the difference between them are larger than threshold it adjusts / drops depending on setting. The solution seems to be to disable this threshold, however this might only be possible through ARM template. I have tried setting this setting to -1 in Visual studio, but when I try this I don't get any output even though I am setting the action to adjust, when I set it to 0 I get events. -1 seems to break it. – ruffen Apr 13 '20 at 16:35
  • @ruffen I'm afraid we can't provide further assistance because we have no idea about the System.Timestamp basic mechanisms. I'd suggest you submitting this concern to ASA github Issues:https://github.com/Azure/azure-stream-analytics/issues so that you could get official response of this feature. – Jay Gong May 04 '20 at 06:33