0

I need some inputs on file processing in Azure Data lake storage using Power Shell.

I have a pipe Delimited input file in my ADLS Gen 1 Account.

The File content looks like below

1|2|3|a,b,
3|4|5|d,h,

I am able to remove last comma using powershell in my Local PC using below code

Get-Content $file_name | ForEach-Object {$_.TrimEnd(",")  } 

But when i run the same query against the same file in Azure Data lake Storage Gen 1 Account nothing happen to the data . The code i am using is

Get-AzureRmDataLakeStoreItemContent -Account $accountName -Path $myrootdir/path/test.csv| ForEach-Object {$_.TrimEnd( ",")  }

One observation i have is that ForEach-Object is returning only once. That is if i print hello inside ForEach-Object loop it prints only one. But i verified that there is no new line problem by running -Head and -Tail command. I am attaching a screenshot for the same.

Can you please help me to understand what i am doing wrong here and any alternative to remove last comma in each line.

Compare behavior between Local and ADLS

  • what's the returned type when use `Get-AzureRmDataLakeStoreItemContent`? string or array? – Ivan Glasenberg Nov 28 '18 at 09:03
  • Hello Ivan, As per the documentation it is a Byte or a string. https://learn.microsoft.com/en-us/powershell/module/azurerm.datalakestore/get-azurermdatalakestoreitemcontent?view=azurermps-6.13.0 – Naveen Venugopal Nov 28 '18 at 09:13

1 Answers1

0

I don't think you can modify the store item directly via powershell.

The Get-AzureRmDataLakeStoreItemContent just gets the content. (Based on my experience, if it allows you to do that, it should be a command like Set-AzureRmDataLakeStoreItemContent orUpdate-AzureRmDataLakeStoreItemContent)

The workaround is to export the file -> modify it in local -> import it again.

Update:

If I do not misunderstand your question, try the command below.

((Get-AzureRmDataLakeStoreItemContent -AccountName "joydatalake1" -Path "/sss/test.csv").ToString() -split("`r")).Trim() | ForEach-Object {$_.TrimEnd(",")}

enter image description here

Joy Wang
  • 39,905
  • 3
  • 30
  • 54
  • Hi Joy, Thank you for your comments. if you see my code , i have not tried to Write any contents as i need to manipulate my data first . My idea was to pass the output of Get-AzureRmDataLakeStoreItemContent which i believe is a Object to Set-AzureRmDataLakeStoreItemContent . But here the problem is arising in the first part of file manipulation itself. The work around of export and import is not a practical solution as this need to be part of an automated workflow. – Naveen Venugopal Nov 28 '18 at 08:41
  • Hi Joy, Thank You for your suggestion, it worked. I have accepted the solution. If you would be kind enough to add some explanation it will be very helpful. – Naveen Venugopal Nov 28 '18 at 10:14
  • @NaveenVenugopal The output of the command is only one object, so if you use foreach-object, it will just remove the last comma. In local, the output is two objects. – Joy Wang Nov 28 '18 at 10:42
  • Joy is right. When use `Get-AzureRmDataLakeStoreItemContent`, it returns a string, you can iterate only once(for `get-content`, it returns array, so you can iterate all of items in it). – Ivan Glasenberg Nov 28 '18 at 13:07
  • @JoyWang@IvanYang Apologies for another question. When i pass the output of your command to a variable and pass it to NewAzureRmDataLakeStoreItem i am getting error "Invalid content passed in" . Any inputs on how to make it work. $data= Your command $data.GetType() IsPublic IsSerial Name BaseType -------- -------- ---- -------- True True Object[] System.Array > New-AzureRmDataLakeStoreItem -Account $account -path $myrootdir/test_output.txt -Value $data Invalid content passed in. Only byte[] and string content is supported. – Naveen Venugopal Nov 29 '18 at 06:18
  • @NaveenVenugopal Hi, Naveen, for the stackoverflow policy, could you post it as another question? It will be better for others to refer to it. – Joy Wang Nov 29 '18 at 06:23
  • @JoyWang . Thanks . Here is the new Thread https://stackoverflow.com/questions/53533224/writing-output-of-string-manipulation-to-azure-data-lake-store-item – Naveen Venugopal Nov 29 '18 at 06:42