0

I'm trying to separate Valid and Invalid records in a CSV file based on a number of columns. I went through the below SO question and seems to be doing the same.

Delete row which has more than X columns in a csv

However for my case delimiter is ^H, so tried below approaches.

awk -v FS="\b" 'NF==3' sample.csv >> output.csv
awk -v FS="\\^H" 'NF==3' sample.csv >> output.csv
awk -v FS="\\cH" 'NF==3' sample.csv >> output.csv
awk -v FS="^H" 'NF==3' sample.csv >> output.csv

However, nothing worked for ^H delimiter.

Moreover, in the previous SO question, they are capturing the valid records(to output), I want to capture both and invalid records into two different files.

Sample.csv

timestamp,header2,header3
1^H1val2^H1val3
2^H2val2^H2val3
3^H4^H4val2^H4val3
5^H5val2^H5val3
6^H6val2^H6val3


Valid.csv

timestamp,header2,header3
1^H1val2^H1val3
2^H2val2^H2val3
5^H5val2^H5val3
6^H6val2^H6val3


Invalid.csv

timestamp,header2,header3
3^H4^H4val2^H4val3

Any suggestions please.

data_addict
  • 816
  • 3
  • 15
  • 32

1 Answers1

2

Could you please try following. It is basically looking for 2 conditions 1st is checking if Number of fields are 3 then place output into Valid.csv if number of fields are more than 3 then place output into Invalid.csv output file.

awk -F'\\^H' 'NF==3 && FNR>1{print > "Valid.csv"} NF>3 && FNR>1{print > "Invalid.csv"} FNR==1{print $0 > "Valid.csv";print $0 > "Invalid.csv"}' sample.csv

Adding non-one liner form of solution too now.

awk -F'\\^H' '
NF==3 && FNR>1{
  print > "Valid.csv"
}
NF>3 && FNR>1{
  print > "Invalid.csv"
}
FNR==1{
  print $0 > "Valid.csv"
  print $0 > "Invalid.csv"
}' sample.csv
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Hi @RavinderSingh13, I tried the above one. One observation is header is missing in both valid and invalid files. I didn't the FNR one, Could you please it, as the number of columns may change for different file want to make sure, I understand the command. – data_addict Oct 11 '18 at 09:38
  • @user805, yes if column may vary you could change conditions, for headers, let me fix it now. – RavinderSingh13 Oct 11 '18 at 09:45
  • @user805, could you please check now my edited solution and let me know then? – RavinderSingh13 Oct 11 '18 at 09:49