2

We have a large U-SQL table containing simple time series data. The table is partitioned per day. Whenever a new batch of data is received, we need to insert new time series data points AND update any previously received data points with a new value, in case the new batch contains updated values for old data points.

Since we cannot perform granular UPDATEs or DELETEs with U-SQL, we wanted to simply truncate the affected partitions and insert the recalculated daily values. Our U-SQL script that does the merge, identifies which partitions need to be truncated.

Unfortunately, since we cannot create loops in U-SQL, there seems to be no way to dynamically truncate the identified partitions. A suggestion I found elsewhere, was to hand the truncation of partitions over to a PowerShell script, but I would really like to keep everything inside the same U-SQL script, to avoid storing and retrieving temporary rowsets any more than necessary.

I thought about using a custom C# function, but it doesn't seem like the U-SQL SDK allows C# functions to access/modify database metadata. Are there any other options available?

Dan
  • 10,480
  • 23
  • 49

1 Answers1

1

The SDK allows you to query meta data, but not to manipulate the objects.

Another option is that you write a script to generate the script based on the data and then run the generated script. It still means that you write two scripts, but you don't really have to store temporary data.

Do you know how many partitions you may need to update going back?

Michael Rys
  • 6,684
  • 15
  • 23
  • 1
    Unfortunately, we don't know which partitions need to be truncated until after we have EXTRACT'ed the newly arrived data, which is itself costly due to the volumes involved. We will experiment a little to see if the cost of storing data temporarily is justified, compared to generating reverse postings, which will allow us to do inserts only. – Dan Aug 30 '17 at 18:09
  • Please post requests for missing capabilities for your scenarios to http://aka.ms/adlfeedback. For example, you can upvote a request for data manipulation statements such as DELETE, UPDATE there. – Michael Rys Aug 30 '17 at 18:37
  • Trying to truncate a partitioned table gives the following error: `A partition clause is required when referring to a partitioned table.` Is there a simple way to truncate ALL partitions of a table at once, or would it be best to drop and recreate the table? – Dan Nov 13 '17 at 09:43