0

Lets assume we have a string like this:

383;06;55.270989;144991494994851A5485AA54J7HH337H3H33HT570BBG7BBGBT07BT7R55U155U5IR75I79QQ9SQQ9Q597Q57S229122928S4284;N

But down the file we encounter something like this:

383;06;55.270989;||<FD><F0>p|/x|<A9>|<E2>|,|<F7>|l|L@<F5>q|I|b%<EB><AB><C2>l|F|<D7>%|<C0><E4>wy||z<BE>|;|b<E5>&x"h<D1>e|j|E|c|<F4><E1>
<C2>4^|Q|<EF>H|<E0>2t<C2>6'<E4><C7>||Z|<E0>q|9d|;N

Is there a way to run this and say if the txt file do not have x number of fields (separator ;) or remove it from file and place it in a log file?

Edit: this method also include having a log for the data that is being removed for later analysis

Jorge Y. C. Rodriguez
  • 3,394
  • 5
  • 38
  • 61
  • if decision is based on number of fields, `awk` is good choice.. if you just want to skip lines based on character, say `|` or `<` then you can use `grep`... either way, give it a shot... plenty of duplicates around – Sundeep Jun 14 '17 at 09:58
  • the decision ins based on the number of fields, but I have not idea where to start :( – Jorge Y. C. Rodriguez Jun 14 '17 at 09:59
  • 2
    https://www.gnu.org/software/gawk/manual/gawk.html is the best place to start... use https://www.gnu.org/software/gawk/manual/gawk.html#Patterns-and-Actions and https://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators – Sundeep Jun 14 '17 at 10:05
  • Possible duplicate of [Filtering Rows Based On Number of Columns with AWK](https://stackoverflow.com/questions/3393895/filtering-rows-based-on-number-of-columns-with-awk) – RomanPerekhrest Jun 14 '17 at 10:07
  • Not really duplicated of that because it doesn’t tells how to write to a lot file – Jorge Y. C. Rodriguez Jun 14 '17 at 10:10
  • 1
    @jycr753, ok, the solution is relatively easy. Post more context (more lines) with expected filtered result and resulting log file contents – RomanPerekhrest Jun 14 '17 at 10:24

1 Answers1

2

To output two files you can redirect print statements in the case of lines you want to remove. Write the lines you want to keep to a tmp file and copy back to your input:

$ cat input
383;06;55.270989;144991494994851A5485AA54J7HH337H3H33HT570BBG7BBGBT07BT7R55U155U5IR75I79QQ9SQQ9Q597Q57S229122928S4284;N
383;06;55.270989;||<FD><F0>p|/x|<A9>|<E2>|,|<F7>|l|L@<F5>q|I|b%<EB><AB><C2>l|F|<D7>%|<C0><E4>wy||z<BE>|;|b<E5>&x"h<D1>e|j|E|c|<F4><E1><C2>4^|Q|<EF>H|<E0>2t<C2>6'<E4><C7>||Z|<E0>q|9d|;N

$ awk -F\; 'NF != 5 { print > "logfile.log"; next }1' input > tmp; mv tmp input

$ cat logfile.log
383;06;55.270989;||<FD><F0>p|/x|<A9>|<E2>|,|<F7>|l|L@<F5>q|I|b%<EB><AB><C2>l|F|<D7>%|<C0><E4>wy||z<BE>|;|b<E5>&x"h<D1>e|j|E|c|<F4><E1><C2>4^|Q|<EF>H|<E0>2t<C2>6'<E4><C7>||Z|<E0>q|9d|;N

$ cat input
383;06;55.270989;144991494994851A5485AA54J7HH337H3H33HT570BBG7BBGBT07BT7R55U155U5IR75I79QQ9SQQ9Q597Q57S229122928S4284;N
jas
  • 10,715
  • 2
  • 30
  • 41
  • Im total noob to this kind of commands, how do I do when all those lines are inside a file? – Jorge Y. C. Rodriguez Jun 14 '17 at 10:47
  • 1
    In my example, the name of the file with the original lines is "input". Just replace that with the real name of your file (in the two places where it occurs). You might also want to replace "logfile.log" with a more meaningful name. BE CAREFUL that after you run the command you'll have replaced your original file! Make sure you keep a copy :-) – jas Jun 14 '17 at 11:42