0

I'm using U-SQL with a table in Azure Data Lake Analytics. Is there any way to pass a list of partition keys generated in a C# program to the U-SQL script then have the script return all the elements in those partitions?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Justin Borromeo
  • 1,201
  • 3
  • 13
  • 26

1 Answers1

2

Do you want to run the C# code on your dev box and pass values to a U-SQL script or run C# code inside your U-SQL Script? Your description is not clear. Based on your question title, I will answer your first question.

Passing values as parameters from a C# program: The ADLA SDK (unlike Azure Data Factory) does not yet provide a parameter model for U-SQL scripts (please file a request at http://aka.ms/adlfeedback, although I know it is on our backlog already, having external customer demand helps in prioritization).

However it is fairly easy to add your parameter values by prepending DECLARE statements like the following in the beginning of the script and have the script refer to them as variables.

DECLARE @param = new SqlArray<int>( 1, 2, 3, 4); // 1,2,3,4 were calculated in your C# code (I assume you have int partition keys).

Then you should be able to use the array in a predicate (e.g., @param.Contains(partition_col)). That will not (yet, we have a workitem for it) trigger partition elimination though.

If you want partition elimination, you will have to have a fixed set of parameter values and use them in an IN clause. E.g., you want to check up to 3 months, you would write the query predicate as:

WHERE partition_col IN (@p1, @p2, @p3);

And you prepend definitions for @p1, @p2 and @p3, possibly duplicating values for the parameters you do not need.

Michael Rys
  • 6,684
  • 15
  • 23
  • So all of the parameter values must be hardcoded ahead of time? – Justin Borromeo Feb 06 '17 at 21:22
  • I have a C# function (getKeys) that generates a list of string partition keys to query a Data Lake Store database with. Is there any way to query the database with U-SQL with those partition keys (i.e. not hardcoded) in such a way that partition elimination occurs? My code that doesn't result in partition elimination is below: @results = SELECT RowKey FROM Schema.TableName WHERE USQLApp.queryHelper.getKeys().Contains(PartitionKey) – Justin Borromeo Feb 07 '17 at 20:15
  • Currently only simple comparison predicates where the values can be constant-folded (e.g., are known at compile time) can be used for partition elimination. The hard coding of the parameter values is not that big of an issue since every script will be compiled on submission. Thus adding the parameter values at the submission time works most of the time. – Michael Rys Feb 07 '17 at 20:51
  • If you can only hardcode parameters, is it possible to use U-SQL and ADLA with a .NET application in which the user can indirectly change parameters? In other words, if the user can set filters that alter the partition keys being queried, is it at all possible to change the query? – Justin Borromeo Feb 07 '17 at 21:06
  • Yes. you have your normal script and you prepend the DECLARE @p statements as I outline above. That's what ADF does. The SDK is currently working on adding a layer on top to make this a better experience. – Michael Rys Feb 08 '17 at 10:22
  • What does prepending a DECLARE statement mean and would you mind providing a short code sample? – Justin Borromeo Feb 08 '17 at 14:05
  • 1
    If `script` is a C# variable containing your script with a variable parameter `@p`, and `p` is the variable containing your parameterized value, you would write something like the following before submitting it with the SDK: `script = String.Format("DECLARE @p = {0}; {1}", p, script);` Obviously, you should also guard against SQL Injections if `p` is a string. – Michael Rys Feb 08 '17 at 17:58