-1

I have many functions in ksh scripts(which uses gawk a lot) which does many computations on files. Files are pipe delemited. But now my source files changed. Now each field in the file comes within double quotes as below. Also, I have to trim the leading and trailing spaces or tabs if any.

Old_Myfile.txt

Name|Designation|emlid
Alex|Software Design Engg|E0023
Corner|SDE|E0056

New_Myfile.txt

"Name"|"Designation"|"emlid"
"Alex"|"Software Design Engg"|" E0023"
"      Corner  "|"      SDE"|" E0056 "

Please suggest ways that will be compatible to my already written script.

DPR
  • 25
  • 6

3 Answers3

2

with sed

$ sed 's/ *" *//g' file

Name|Designation|emlid
Alex|Software Design Engg|E0023
Corner|SDE|E0056

can be combined in the awk script without this extra step as well.

karakfa
  • 66,216
  • 7
  • 41
  • 56
0

This script may be over-engineered for what you need, but it will operate on each field individually (within the for-loop), in case you need to add additional logic at a later time.

BEGIN{
  FS="|";
  OFS="|";
}

{
  for(i=1; i<=NF; i++){
    gsub(/(^"[ ]*|[ ]*"$)/, "", $i);

    if (i == NF) {
      printf("%s\n", $i);
    }
    else {
      printf("%s%s", $i, OFS);
    }
  }
}

Here's the output

$ awk -f /tmp/script.awk </tmp/input.txt
Name|Designation|emlid
Alex|Software Design Engg|E0023
Corner|SDE|E0056
wpcarro
  • 1,528
  • 10
  • 13
  • gsub(/(^"[ ]*|[ ]*"$)/, "", $i); – DPR Sep 22 '16 at 06:20
  • I have used this solution. gsub(/(^"[ ]*|[ ]*"$)/, "", $i); This gives the below result: If spaces are there on both sides of the field, it only trims one side, that is the leading spaces. I had to modify the script to: gawk -F "|" ' {OFS="|" } { for (i=1;i<=NF;i++) sub(/\"$/, "", $i); } {for (i=1;i<=NF;i++) sub(/^\"/, "", $i); } {for (i=1;i<=NF;i++) sub(/^[[:space:]]+|[[:space:]]+$/, "", $i) } {print $0} ' $1 Why is the or(|) option not working as expected? – DPR Sep 22 '16 at 06:31
  • Why did you change `gsub` to `sub`? `gsub` will "globally" substitute each occurrence for each line. This is preferable to having two for-loops that each does a `sub` operation. Also, there's no difference between `gawk -F "|"` and `gawk 'BEGIN{FS="|"...'`. Just pointing that out in case you didn't know. As for your regex, try wrapping it in parens that way that the supplied regex is wrapped. Also I'm not sure why you needed to modify the script. The output in the post is the output you expect, right? What are the edge cases that are missing? Care to provide those? – wpcarro Sep 22 '16 at 14:05
0

If your quoted fields cannot contain |s then within your existing awk script add this as the first line:

awk '
{ gsub(/[[:space:]]*"[[:space:]]*/,"") }
<existing script>
'
Ed Morton
  • 188,023
  • 17
  • 78
  • 185