0

I have an issue with columns delimiters adjustment in a file in linux into a database.

I need 14 columns and I use "|" as a delimiter so I applied :

awk -F'|' '{missing=14-NF;if(missing==0){print $0}else{printf "%s",$0;for(i=1;i<=missing-1;i++){printf "|"};print "|"}}' myFile

Suppose I have a row like that:

a|b|c|d|e||f||g||||h|i|

after applying the awk command it will be:

a|b|c|d|e||f||g||||h|i||

and this is not acceptable I need the data to be 14 columns only.

Sample input {In case of 14 fields row]:

a|b|c|d|e||f||g||||h|i

Do nothing

Sample input {In case of extra fields]:

a|b|c|d|e||f||g||||h|i|

ouput:

a|b|c|d|e||f||g||||h|i

Sample Input {In case of less fields}:

a|b|c|d||e||f||g|h

output:

a|b|c|d||e||f||g|h|||
anubhava
  • 761,203
  • 64
  • 569
  • 643

1 Answers1

0

You may use this gnu-awk solution:

awk -v n=14 '
BEGIN {FS=OFS="|"}
{
   $0 = gensub(/^(([^|]*\|){13}[^|]*)\|.*/, "\\1", "1")
   for (i=NF+1; i<=n; ++i)
      $i = ""
} 1' file

a|b|c|d|e||f||g||||h|i
a|b|c|d|e||f||g||||h|i
a|b|c|d||e||f||g|h|||

Where original file is this:

cat file

a|b|c|d|e||f||g||||h|i
a|b|c|d|e||f||g||||h|i|
a|b|c|d||e||f||g|h

Here:

  • Using gnsub we remove all extra fields
  • Using for loop we create new fields to make NF = n

If you don't have gnu-awk then following should work on non-gnu awk (tested on BSD awk):

awk -v n=14 '
BEGIN {FS=OFS="|"}
{
   for (i=NF+1; i<=n; ++i) $i=""
   for (i=n+1; i<=NF; ++i) $i=""
   NF = n
} 1' file
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Hi, this shall work if NF=14 or less.. but in case of an extra delimiter, it will delete the last delimiter and the last value. please check the output for extra fields again – Mina El-Raheb Sep 16 '22 at 08:10
  • Out of 3 lines shown in my generated output which one is not matching your expected output? – anubhava Sep 16 '22 at 08:40
  • 1
    The second output... input: a|b|c|d|e||f||g||||h|i| ---> output: a|b|c|d|e||f||g||||h| It should be a|b|c|d|e||f||g||||h|i – Mina El-Raheb Sep 16 '22 at 08:43
  • ok got it. I have updated my answer now. It matches with your expected output. – anubhava Sep 16 '22 at 11:39