7

Trying to use awk command to implement this rule: if line doesn't starts with "O|" or "A|" or "S|" I want to remove new line on before line

I have this file in input (input.txt)

O|field1|field2
O|field1|
field2
A|field1|
field2
S|field1|
field2
O|field1|field2
O|field1|field2
O|field1|
field2
A|field1|
field2
S|field1|
field2
O|field1|field2

I want this output

O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2
O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2

Executing this code

awk '/^O\|/ || /^A\|/ || /^S\|/ {printf "%s", $0; next} 1 {print}' input.txt > output.txt

It returns

O|field1|field2O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2O|field1|field2O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2

Somebody can help me please?

Luca L
  • 71
  • 1

8 Answers8

4

This awk should work for you:

awk -F'|' 'NF==3 && $3 == "" {p = $0; next}
      {print (NF == 1 ? p $1 : $0); p = ""}' file

O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2
O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2
anubhava
  • 761,203
  • 64
  • 569
  • 643
4

With your shown samples please try following awk code.

awk '
BEGIN{FS=OFS="|"}
!/\|/ {
  print val,$0
  val=""
  next
}
$0~/\|$/ && NF==3{
  val=$0
  next
}
1
' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
3

Something like this, which tests the layout of the records, might be better for you than testing the values of the fields:

$ awk -v RS='([^|]*[|]){2}[^|]*\n' '{$0=RT; gsub(/\n/,""); print}' file
O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2
O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2

The above uses GNU awk for multi-char RS to just define a record as being 3 fields separated by |s and ending in a newline, then removes any newlines from each record before printing it.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
3

You only seem to have the issue with a missing last field.

If the | is the field delimiter, you can check if the 3rd field is not empty and print the whole line.

Print the previous line plus the current line if field 1 is not A O or S

awk -F'|' '{
  if($1 !~ /^[AOS]$/) { print p $0; next }
  if ($3!="") print $0
  p = $0
}' file

Output

O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2
O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
3

another solution

awk -v RS="" '{gsub("\\|\n","|")}1' file

O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2
O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2

treat the file as one long stream and remove the newlines after the pipes.

karakfa
  • 66,216
  • 7
  • 41
  • 56
1
{m,g,n}awk NF=NF RS= OFS=\| FS='[|]\n'
{  g,n}awk NF=NF RS= OFS=\| FS='\\|\n' 
{m    }awk NF=NF RS= OFS=\| FS='\|\n' 
O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2
O|field1|field2
O|field1|field2
A|field1|field2
S|field1|field2
O|field1|field2
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11
  • 2
    Interesting solution, can you explain a bit how this works? What does `NF=NF` do, and can you only place this at the beginning? – The fourth bird Dec 20 '22 at 09:28
  • 1
    @Thefourthbird : `NF = NF` is same as what others do with `$1=$1`, but u can safely type in that in the console terminal, unquoted. `mawk`s take command line assignments as is regarding backslashes, while `gawk` and `nawk` treats them just like any double quoted string inside the main code, thus necessitating double backslashes, as u can see in the 2 diff variants of `FS`. If you place any of the stuff after `NF=NF` to its left, you must also add the `-v` flag prefix, e.g. `-v OFS=…`. Doing it after the main code allows for skipping that part, but those are processed ….. – RARE Kpop Manifesto Dec 21 '22 at 07:32
  • 1
    ….. after all `BEGIN { }` sections, if any, but prior to `NR == 1`. Setting `RS` to blank means all chunks of input without entirely zero length blank lines in between get processed at once, and I simply swapped the roles of `FS` and `RS`, and use `FS + OFS` to fix the formatting issue. So these solutions are designed for inputs that aren't continuously piped in in nature. – RARE Kpop Manifesto Dec 21 '22 at 07:34
  • 1
    @Thefourthbird : `$1 = $1` is actually a problematic form ::::::::::::::::::::: ::::::::::::::::::: ::::::::::::::::: `echo ' abc xyz ' | mawk '$1=$1' FS='[ ]+'` :::::::::::::::::::::::::::::::: :::::::::::::: ::::::::::::::::::: prints absolutely nothing, because `$1` is actually an empty string, and `"abc"` resides in `$2`. the assignment is same as doing `$1 = ""`, which means the pattern is evaluating an empty string to boolean `FALSE`, thus no default action of `{ print }` is being performed. `NF=NF` fixes that issue, keeping in mind that it still skips empty rows since `NF` is zero – RARE Kpop Manifesto Dec 21 '22 at 07:47
1

With GNU sed:

sed -rz 's/\|\n([^OAS])/\|\1/g' input.txt
Walter A
  • 19,067
  • 2
  • 23
  • 43
0

How about:

awk '/^[OAS]\|/ {if (l){print l}l=$0;next} {l=l $0} END {print l}' inputFile

The variable l represents the line we are building.

If the input begins with the special character, then print l (if it is not empty). Otherwise, start building the line up. The END is run after all lines are processed to print out the last version of l.

Martin York
  • 257,169
  • 86
  • 333
  • 562