0

I've got an existing set of azure storage tables that are one-per-client to hold events in a multi-tenant cloud system.

Eg, there might be 3 tables to hold sign-in information:

ClientASignins ClientBSignins ClientCSignins

Is there a way to dynamically loop through these as part of either a copy operation or in something like a Pig script?

Or is there another way to achieve this result?

Many thanks!

SteveM1972
  • 11
  • 1

2 Answers2

0

If you keep track of these tables in another location, like Azure Storage, you could use PowerShell to loop through each of them and create a hive table over each. For example:

foreach($t in $tableList) {
    $hiveQuery = "CREATE EXTERNAL TABLE $t(IntValue int)
 STORED BY 'com.microsoft.hadoop.azure.hive.AzureTableHiveStorageHandler'
 TBLPROPERTIES(
  ""azure.table.name""=""$($t.tableName)"",
  ""azure.table.account.uri""=""http://$storageAccount.table.core.windows.net"",
  ""azure.table.storage.key""=""$((Get-AzureStorageKey $storageAccount).Primary)"");"
Out-File -FilePath .\HiveCreateTable.q -InputObject $hiveQuery -Encoding ascii
$hiveQueryBlob = Set-AzureStorageBlobContent -File .\HiveCreateTable.q -Blob "queries/HiveCreateTable.q" `
  -Container $clusterContainer.Name -Force
$createTableJobDefinition = New-AzureHDInsightHiveJobDefinition -QueryFile /queries/HiveCreateTable.q
$job = Start-AzureHDInsightJob -JobDefinition $createTableJobDefinition -Cluster $cluster.Name
Wait-AzureHDInsightJob -Job $job
#INSERT YOUR OPERATIONS FOR EACH TABLE HERE
}

Research: http://blogs.msdn.com/b/mostlytrue/archive/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight.aspx

How can manage Azure Table with Powershell?

Community
  • 1
  • 1
Andrew Moll
  • 4,903
  • 2
  • 13
  • 15
0

In the end I opted for a couple Azure Data Factory Custom Activities written in c# and now my workflow is:

  1. Custom activity: aggregate the data for the current slice into a single blob file for analysis in Pig.
  2. HDInsight: Analyse with Pig
  3. Custom activity: disperse the data to the array of target tables from blob storage to table storage.

I did this to keep the pipelines as simple as possible and remove the need for any duplication of pipelines/scripts.

References:

Use Custom Activities In Azure Data Factory pipeline

HttpDataDownloader Sample

SteveM1972
  • 11
  • 1