Script to find valid and invalid records CSV file

Question

I'm trying to separate Valid and Invalid records in a CSV file based on a number of columns. I went through the below SO question and seems to be doing the same.

Delete row which has more than X columns in a csv

However for my case delimiter is ^H, so tried below approaches.

awk -v FS="\b" 'NF==3' sample.csv >> output.csv
awk -v FS="\\^H" 'NF==3' sample.csv >> output.csv
awk -v FS="\\cH" 'NF==3' sample.csv >> output.csv
awk -v FS="^H" 'NF==3' sample.csv >> output.csv

However, nothing worked for ^H delimiter.

Moreover, in the previous SO question, they are capturing the valid records(to output), I want to capture both and invalid records into two different files.

Sample.csv

timestamp,header2,header3
1^H1val2^H1val3
2^H2val2^H2val3
3^H4^H4val2^H4val3
5^H5val2^H5val3
6^H6val2^H6val3


Valid.csv

timestamp,header2,header3
1^H1val2^H1val3
2^H2val2^H2val3
5^H5val2^H5val3
6^H6val2^H6val3


Invalid.csv

timestamp,header2,header3
3^H4^H4val2^H4val3

Any suggestions please.

Why are you treating `^H` as part of valid file content? – Inian Oct 11 '18 at 09:24 — Inian, Oct 11 '18 at 09:24
because that's delimiter of the CSV file @Inian – data_addict Oct 11 '18 at 09:25 — data_addict, Oct 11 '18 at 09:25

RavinderSingh13 · Accepted Answer · 2018-10-11T10:02:57.827

2

Could you please try following. It is basically looking for 2 conditions 1st is checking if Number of fields are 3 then place output into Valid.csv if number of fields are more than 3 then place output into Invalid.csv output file.

awk -F'\\^H' 'NF==3 && FNR>1{print > "Valid.csv"} NF>3 && FNR>1{print > "Invalid.csv"} FNR==1{print $0 > "Valid.csv";print $0 > "Invalid.csv"}' sample.csv

Adding non-one liner form of solution too now.

awk -F'\\^H' '
NF==3 && FNR>1{
  print > "Valid.csv"
}
NF>3 && FNR>1{
  print > "Invalid.csv"
}
FNR==1{
  print $0 > "Valid.csv"
  print $0 > "Invalid.csv"
}' sample.csv

edited Oct 11 '18 at 10:02

answered Oct 11 '18 at 09:29

RavinderSingh13

130,504
14
57
93

Hi @RavinderSingh13, I tried the above one. One observation is header is missing in both valid and invalid files. I didn't the FNR one, Could you please it, as the number of columns may change for different file want to make sure, I understand the command. – data_addict Oct 11 '18 at 09:38
@user805, yes if column may vary you could change conditions, for headers, let me fix it now. – RavinderSingh13 Oct 11 '18 at 09:45
@user805, could you please check now my edited solution and let me know then? – RavinderSingh13 Oct 11 '18 at 09:49

Script to find valid and invalid records CSV file

1 Answers1