-1

I am getting data in csv file with double quotes around string columns but while reading csv file using U-sql i am getting errors because of double quotes in data as well.

I am thinking of replacing double quotes which is in data at first step then read that file but not sure how to do that as we have double quotes everywhere.

Any suggestions would be appreciated or if someone can help me giving the powershell or .net code to do the same that would be great help as I am not good in .net or powershell.

Sample Data

“Name”;”Department” “Abc”;”Education”Teaching” “Cde”;”Test”Another”

It should be

“Name”;”Department” “Abc”;”EducationTeaching” “Cde”;”TestAnother”

Maximilian Burszley
  • 18,243
  • 4
  • 34
  • 63
Raj
  • 462
  • 3
  • 15

3 Answers3

0

You can use a regex find/replace in Visual Studio Code. For example (and assuming that the data only contains letters, you can edit the regex as needed):

Find regex: "([a-zA-Z]+)"([a-zA-Z]+)"
Replace string: "$1$2"
Input string: "Name";"Department" "Abc";"Education"Teaching" "Cde";"Test"Another"
Output string: "Name";"Department" "Abc";"EducationTeaching" "Cde";"TestAnother"
emagers
  • 841
  • 7
  • 13
0

So it seems that your quotes are not the standard [Char]34. Instead they are [Char]8220; [Char]8221

So we need to do a replace in powershell

$TEST = @"
“Name”;”Department” “Abc”;”Education”Teaching” “Cde”;”Test”Another”
"@

$TEST | %{ 
    $_ = $_ -replace [char]8220, '"'
    $_ = $_ -replace [char]8221, '"'
    $_ -replace '"([a-zA-Z]+)"([a-zA-Z]+)"','"$2 $1"'
}

this would make the output :

"Name";"Department" "Abc";"Teaching Education" "Cde";"Another Test"
ArcSet
  • 6,518
  • 1
  • 20
  • 34
0

You could also do this in a custom row processor. Have the initial data read the CSV file into a variable as a single column row (raw data). Then pass each row through a row processor to parse the data and remove the offending characters. I've done something similar for handling Fixed Width text files.

Joel Cochran
  • 7,139
  • 2
  • 30
  • 43