2

I have a timestamp column in DocDb, I would like to query that in Azure Data Factory copy pipeline, which copies DocDb to Azure Data Lake

I would like to

select * from c
where c._ts > '@{pipeline().parameters.windowStart}'

But I got

Errors":["An invalid query has been specified with filters against path(s) that are not range-indexed.

In the DocDb policy, I have

"includedPaths": [
    {
        "path": "/*",
        "indexes": [
            {
                "kind": "Range",
                "dataType": "Number",
                "precision": -1
            },
            {
                "kind": "Hash",
                "dataType": "String",
                "precision": 3
            }
        ]
    }
  ]

I think this should allow _ts int64 to be queried by range.

Where did I go wrong?

Thanks.

Daolin
  • 614
  • 1
  • 16
  • 41

2 Answers2

1

I reproduce your issue with your sql and your index policy.

enter image description here

Based on my observation, it seems that the filter is treated as String,not Int.You could remove the ' in your sql and try again,it works for me.

sql:

select * from c
where c._ts > @{pipeline().parameters.windowStart}

Output:

enter image description here

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • I did tried that early. Error: DocumentDb operation failed: Message: {"errors":[{...,"message":"Syntax error, invalid numeric value token '24T01'."}]} ActivityId: 9de19137-66d1-45d3-937f-784556364334, documentdb-dotnet-sdk/2.1.3 Host/64-bit MicrosoftWindowsNT/6.2.9200.0.. Activity ID:9de19137-66d1-45d3-937f-784556364334. I also tried @{formatDateTime(pipeline().parameters.windowStart, 'yyyy-MM-ddTHH:mm:ssZ' ).TotalSeconds} @{pipeline().parameters.windowStart.ToUnixTimeSeconds()} @{pipeline().parameters.windowStart.TotalSeconds} None seems to be working – Daolin Jan 15 '19 at 17:05
0

Thanks, @Jay.

I ended up using UDF

function dateTime2Epoch(dateTimeString){
    return Math.trunc(new Date(dateTimeString).getTime()/1000);
}

in Cosmos db. Then in Azure Data Factory query

select * from c 
where c._ts >= udf.dateTime2Epoch('@{pipeline().parameters.windowStart}')
  and c._ts < udf.dateTime2Epoch('@{pipeline().parameters.windowEnd}')

However, the query seems to be very slow. I will update this when I found more.

Update: Ended up with copying the whole thing.

Community
  • 1
  • 1
Daolin
  • 614
  • 1
  • 16
  • 41