awk how to print one column only but other columns have more than one word

Question

This is a sample data:

+-------+--------------------+-----------+------------+-----------+-------------+
| ID    | Name               | Status    | Networks   | Image     | Plan        |
+-------+--------------------+-----------+------------+-----------+-------------+
| 1wsd  | HostName A         | PAUSED    | IP=1.1.1.1 | Ubuntu20  | PlanA BGP40 |
| 4fgh  | An other hostname  | ACTIVE    | IP=2.2.2.2 | Ubuntu20  | PlanB BGP30 |
| zxd1  | final.destination  | REBOOTING | IP=3.3.3.3 | Debian11  | PlanA BGP10 |
| 60hn  | no problem         | ACTIVE    | IP=4.4.4.4 | Centos7   | Plan BGP90  |
+-------+--------------------+-----------+------------+-----------+-------------+

I want to print only Plan column only, but as you see, it's not a fixed column number. For example for 1.1.1.1, the plan name is in the from 11 to the end (let's suppose from the end, but we can remove the | at the end of line).

First, it should filter only the Plan column (the format is like this and is table-like), and we can exclude the first three lines of header and the one last line of the tail so that we can only have the plan names.

Expected output is only plan names:

PlanA BGP40
PlanB BGP30
PlanA BGP10
Plan BGP90

I'm googling but didn't find a way up to now.

You can change the default field separator to what suits your needs: `awk -F'[|]' ...`. Or, to also include the spaces around `|`: `awk -F'[[:space:]]*[|][[:space:]]* ...`. Note: as `|` is a regex operator we cannot just `awk -F'|'`, reason why we use `[|]`. We could also `awk -F'\\|'` (yes, double backslash). — Renaud Pacalet, Jul 31 '23 at 10:31
@RenaudPacalet thanks, yes that's it. I think I was writing wrong because I thought of it but didn't work:( — Saeed, Jul 31 '23 at 10:34
Aside from the suggestion by Ranaud Pacalet: If your columns are always the same fixed width, you could also simply extract the plan names by character position. — user1934428, Jul 31 '23 at 12:13
You don't need to google as I showed you exactly how to do that in [my answer to your previous question](https://stackoverflow.com/a/76797638/1745001) - note the `-F'[[:space:]]*[|][[:space:]]*'` being used to read the same type of `|`-separated data as you show in this question. Please read (and try) the answers you get and ask questions under the answer if there's something in it you don't understand about how it works and/or unexpected output. — Ed Morton, Jul 31 '23 at 12:20

Paolo · Accepted Answer · 2023-07-31T12:46:15.380

You can use:

$ awk -F '|' 'NR>4 {gsub(/^[[:space:]]+|[[:space:]]+$/,"",v); print v} {v=$(NF-1)}' file
PlanA BGP40
PlanB BGP30
PlanA BGP10
Plan BGP90

Set field separator to | (-F '|')
Skip the first three lines and the last one. See here for how this works
Create a variable v which holds the second to last entry (v=$(NF-1)), then remove all leading and trailing whitespace from it (gsub(/^[[:space:]]+|[[:space:]]+$/,"",v))

score 1 · Answer 2 · answered Jul 31 '23 at 12:21

If you're interested in a sed alternative

sed '
  1,3d                             # delete the first 3 lines
  $d                               # delete the last line
  s/[[:blank:]]*|[[:blank:]]*$//   # remove the trailing border
  s/.*|[[:blank:]]*//              # consume all the leading cells.
' file

With sed's basic regular expressions, | is a plain character.

techno · Answer 3 · 2023-07-31T12:50:36.680

1

using `sed`simply:

1st remove last space + pipe: |
Then remove all untils pipe + space: .*|
Oper from line 3 to end

So command will be just:

sed -ne '3,${s/ |[^|]*$//;s/^.*| //p;}'

Or

sed -ne '
    3,${
        s/ |[^|]*$//;
        s/^.*| //p;
    }
'

But if the field have to begin by plan, things become simplier:

sed -ne 's/.*\(Plan.*[^[:space:]]\)[[:space:]]*|[^|]*$/\1/p'

edited Jul 31 '23 at 12:50

answered Jul 31 '23 at 12:29

techno

525
3
13

dawg · Answer 4 · 2023-07-31T15:29:01.580

Here is a Ruby that allows selecting columns by name of the column:

ruby  -e '
h=Hash.new()
desired="Plan"
$<.read.split(/^\+[-+]+\+$\R/).
    select{|l| l[/\S+/]}.
    map{|l| l.split(/\n/)}.flatten.
    map{|sl| sl.split(/\s*\|\s*/)[1..]}.transpose.
    each{|a| h[a[0]]=a[1..]}
puts h[desired].join("\n")
' file

Or, this awk:

awk -v d="Plan" '
BEGIN{FS="[[:blank:]]*[|][[:blank:]]*"; idx=0}
FNR==2{for(i=1; i<=NF; i++) if (d==$i) idx=i; next}
NR>1 {print $idx}
' file

Either prints:

PlanA BGP40
PlanB BGP30
PlanA BGP10
Plan BGP90

You can print any other column by changing desired="Plan" to the desired name at the top of the column.

Freeman · Answer 5 · 2023-07-31T20:57:26.387

I think it's very easy, check this out :

awk -F "|" 'NR>3 && NR<8 {print $7}' file.txt

or another nice solution with the combination of sed and datamash

sed '1,3d;$d' file.txt | datamash -t '|' cut 7

in sed

'1,3d;$d': to delete lines 1 to 3 (1,3d) and the last line ($d) of the file.

in datamash

-t "|": Set the delimiter as "|" (pipe symbol).
cut 7: Select and display only the 7th column (Plan column).

output

PlanA BGP40 
PlanB BGP30 
PlanA BGP10 
Plan BGP90

awk how to print one column only but other columns have more than one word

5 Answers5

using sedsimply:

using `sed`simply: