I have to handle pipe delimited flat files, in which each field comes within double quotes.
sample data:
"1193919"|"false"|""|"Mr. Andrew Christopher Alman"|""|""|"Mr."
I have written many gawk commands in my scripts. Now the issue is:
issue:
Consider this row: "1193919|false||Mr. Andrew Christopher Alman"|""|"Mr."
My script is taking the above as 6 different fields
"1193919
false
[null]
Mr. Andrew Christopher Alman"
[null]
"Mr."
But the data files are sent with the intent that
"1193919|false||Mr. Andrew Christopher Alman"
should be taken as one field, as surrounded by double quotes.
My thought: I was thinking to change the field separator from | to "|"
This has few issues. The last and first fields will come as "1193919
and Mr."
i dont want to use '["][|]["]|^["]|["]$'
as field separator, because this will increase the number fields and my other codes will have to go though a major change.
I am asking for a solution something like: Use | as a field separator only if it is followed by " and preceded by ". But the field separator will be | and not "|"
issue 2:
"1193919""|"false"""|""|"Mr. Andrew Christopher Alman"
At the same time I want to report an error for "false"""
, something like /^"["]+ | ["]+["]$/ and not /^""$/
Good data should be in below format
"1193919"|"false"|""|"Mr. Andrew Christopher Alman"