2

I have a file that looks like this: ( Note : A*, B*, C* are placeholders). The file is delimited by ;

AAAA;BBBB;CCCCCCCC;DD;EEEEEEEE;FF;
AAA1;BBBBB;CCCC;DD;EEEEEEEE;FFFFF;
AAA3;BB;CCCC;DDDDDDDDD;EEEEEEE;FF;

I m trying to write a small script that counts the number of occurrences of the delimiter ; and if it is lesser or greater than 5, output said line to a text file.

delim=";"

while read line
do  
    n_of_occ=$(grep -o "$delim" <<< "$line" | wc -l)

    if [[ $n_of_occ < 5 ]] || [[ $n_of_occ > 5 ]]
    then
        echo $line >> outfile
    fi
done

For some reason, this doesn't seem to work and my output is garbled. Could someone assist or provide a different way to tackle this? Perhaps with Perl instead of bash?

serenesat
  • 4,611
  • 10
  • 37
  • 53
onlyf
  • 767
  • 3
  • 19
  • 39
  • You should try to supply a properly representative set of data. Every line of your sample has six semicolons `;` which, according to your rules, means they should all be printed. Once you have said *"The file is delimited by `;`"* there's little point in giving an example unless it tests the criteria and is accompanied by your corresponding required output – Borodin May 17 '16 at 13:27

5 Answers5

3

This is ridiculous easy with awk:

awk -F\; 'NF!=6' file > outfile

Juan Diego Godoy Robles
  • 14,447
  • 2
  • 38
  • 52
1

I would take this one-liner:

awk '{x=$0}gsub(";","",x)!=5' file
Kent
  • 189,393
  • 32
  • 233
  • 301
1

Easy in Perl:

perl -ne 'print if tr/;// != 5' input_file > output_file
  • -n reads the input line by line
  • the tr operator returns the number of matches
choroba
  • 231,213
  • 25
  • 204
  • 289
1

With sed you can do this:

sed '/^\([^;]*;\)\{5\}$/d' file > outfile

It deletes the lines with exactly 5 commas (;) and sends the output to outfile.


Or if you want your own code to work, then make the following changes:
  1. replace done with done <file
  2. replace [[ with (( and ]] with )) i.e use ((...)) instead of [[...]]
Jahid
  • 21,542
  • 10
  • 90
  • 108
1

Unfortunately every line in your sample data has six semicolons, which means they should all be printed. However, here is a one-line Perl solution

$ perl -ne'print if tr/;// != 5' aaa.csv
AAAA;BBBB;CCCCCCCC;DD;EEEEEEEE;FF;
AAA1;BBBBB;CCCC;DD;EEEEEEEE;FFFFF;
AAA3;BB;CCCC;DDDDDDDDD;EEEEEEE;FF;
Borodin
  • 126,100
  • 9
  • 70
  • 144