Using sed to match on multiple patterns with one expression, and delete until a blank line

Question

On a RHEL 6.6 system, using ifconfig and GNU sed, I want to display only the Ethernet interfaces which aren't logical sub interfaces, or the loopback.

For example, the output should not contain interface records where the interface name is like eth0:134 or lo.

My approach so far has been to use sed with two expressions, The first, /eth[0-9]:/ to match on and include all lines containing 'ethN:, including every line after until a blank line is encountered, and delete, and a second expression to match on, /lo/ and all lines after until a blank line, and delete them as well.

For example:

[user@system ~]$ ifconfig -a | sed '/eth[0-9]:/,/^$/d; /lo/,/^$/d'


eth0     Link encap:Ethernet HWaddr 00:11:22:33:44:55
         inet addr:192.168.0.50 Bcast: 192.168.0.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
         RX packets:1024 ERRORS:0 DROPPED:0 OVERRUNS:0 FRAME:0
         TX packets:2048 ERRORS:0 DROPPED:0 OVERRUNS:0 FRAME:0
         collisions:0 txqueuelen:1000
         RX bytes:6455319 (6.1 MiB)  TX bytes: 258478  (252.4 KiB)

Un-desired output looks like:

eth0:146 Link encap:Ethernet HWaddr 00:11:22:33:44:55
         inet addr:192.168.0.51 Bcast: 192.168.0.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

eth0:147 Link encap:Ethernet HWaddr 00:11:22:33:44:55
         inet addr:192.168.0.52 Bcast: 192.168.0.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric

eth0:148 Link encap:Ethernet HWaddr 00:11:22:33:44:55
         inet addr:192.168.0.53 Bcast: 192.168.0.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric

lo       Link encap:Local Lookback
         inet addr:127.0.0.1 Mask:255.0.0.0
         UP LOOPBACK RUNNING MTU:16436 Metric:1
         RX packets:605 errors:0 dropped:0 overruns:0 frame:0
         TX packets:605 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:59008  (57.6 KiB)  TX bytes:59008  (57.6 KiB)

I like this method of deleting all lines of output starting at and including the matched line until a blank line (^$) is encountered because there are a variable number of extra lines after the line containing the interface name. Either 2, additional, or 6 additional lines in this case.

This method allows there to be N additional lines of output as long as a blank line is still used as a separator between displayed interface records.

How can the second expression, /lo/,/^$/d' be combined with the first?

Perhaps another approach to how the lines are matched (or not matched) is better?

Another issue is that this only matches the first 10 interfaces. There aren't more than 10, but it would be good to account for that in case there are.

I'd like to match on the first 100 interfaces with something like:

^[1-9][0-9]?$|^100$

Solutions using awk are ok as well.

Per Ed Morton's comment, the UN-desired output contains all of the same lines, with just the interface name in the first column changing to include the ethN: and a numeric value, or the string lo, followed by 6 lines of additional output that's specific to that particular interface. — Chris, Jul 12 '16 at 22:08
Edited to include un-wanted output and additional explanation. Thanks for the feedback. — Chris, Jul 12 '16 at 23:04
I was really hoping for simply "input" and "wanted output" - that gives us something to easily test against rather than having to piece together an input file from "wanted output" and "unwanted output" and make assumptions. Having said that, what you've posted now does help so I've updated my answer. — Ed Morton, Jul 12 '16 at 23:05

John1024 · Accepted Answer · 2016-07-13T17:17:44.427

3

Try:

ifconfig -a | sed -r '/(eth[0-9]{1,2}:|eth100:|lo)/,/^$/d'

{1,2} means one or two of the preceding. So, eth[0-9]{1,2} matches eth followed by one or two numbers.

(A|B|C) matches either A or B or C. So, (eth[0-9]{1,2}:|eth100:|lo) matches either eth with one or two numbers or eth100 and a colon or lo.

The used -r for extended regular expressions (ERE). Without -r, sed defaults to basic regular expressions (BRE). ON GNU sed, BRE work the same but at the cost of extra backslashes:

ifconfig -a | sed '/\(eth[0-9]\{1,2\}:\|eth100:\|lo\)/,/^$/d'

BSD/OSX

BSD (OSX) sed does not recognize the -r option. To get extended regex, use -E instead:

ifconfig -a | sed -E '/(eth[0-9]{1,2}:|eth100:|lo)/,/^$/d'

-E will also work with recent versions of GNU sed.

edited Jul 13 '16 at 17:17

answered Jul 12 '16 at 21:11

John1024

109,961
14
137
171

1

That's a nice trick with the 1, or 2 of the proceeding. Thank you for the example. – Chris Jul 12 '16 at 22:33
@A.Danischewski Thanks. On my system, the `lo` line for ifconfig _does_ have a colon. I removed the colon, as per your suggestion, just in case. On the other issue, only recent versions of GNU sed support the `-E` flag (and it is _still_ undocumented). The OP indicated that he was using RHEL 6 and it isn't clear to me that that is new enough to support `-E`. – John1024 Jul 13 '16 at 17:20
2

The version of sed on my rhel 6.6 box is GNU sed version 4.2.1 According to the man page, it does not support the '-E' option. However this box is in an 'Air-Gapped' environment, and may not have the latest version available. – Chris Jul 13 '16 at 18:24
1

I can confirm that '-E' does work on a box with GNU sed version 4.2.1, but is un-documented, and not listed as a parameter in the man page. "-E" does NOT work on a RHEL 5.10 box with GNU sed version 4.1.5. It prints the useage example for sed when using the command provided by John1024 – Chris Jul 14 '16 at 14:00
@ChrisSmith Thanks for doing that research! That is why, in the answer, I didn't recommend `-E` for you. Just to confirm, though, do the other commands, the ones _not_ using `-E`, work for you? – John1024 Jul 14 '16 at 19:08
idk if it can happen but that would fail if the text `lo` appears anywhere in the file other than as the interface name, including it it appears in mid-word (e.g. `below`). – Ed Morton Jul 14 '16 at 20:01
@EdMorton There are, of course, as you know, many solutions for that and I would have been concerned about the issue except that the OP reported that the regexes were working for him. His question did not ask for improved regexes but rather he asked about how to _combine_ the two regexes into one regex. Given that I don't have access to a RHEL6.6 box to test it on, I wasn't going to change the patterns unless there was a reported problem with them and sufficient sample output from his `ifconfig` to verify the solution. – John1024 Jul 14 '16 at 20:12

Ed Morton · Answer 2 · 2016-07-14T17:56:59.183

2

It sounds like all you need is:

awk -v RS= -v ORS='\n\n' '$1~/^eth[0-9]+$/'

e.g.:

$ cat file
eth0:146 Link encap:Ethernet HWaddr 00:11:22:33:44:55
         inet addr:192.168.0.51 Bcast: 192.168.0.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

eth0     Link encap:Ethernet HWaddr 00:11:22:33:44:55
         inet addr:192.168.0.50 Bcast: 192.168.0.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
         RX packets:1024 ERRORS:0 DROPPED:0 OVERRUNS:0 FRAME:0
         TX packets:2048 ERRORS:0 DROPPED:0 OVERRUNS:0 FRAME:0
         collisions:0 txqueuelen:1000
         RX bytes:6455319 (6.1 MiB)  TX bytes: 258478  (252.4 KiB)

eth0:147 Link encap:Ethernet HWaddr 00:11:22:33:44:55
         inet addr:192.168.0.52 Bcast: 192.168.0.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric

eth0:148 Link encap:Ethernet HWaddr 00:11:22:33:44:55
         inet addr:192.168.0.53 Bcast: 192.168.0.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric

lo       Link encap:Local Lookback
         inet addr:127.0.0.1 Mask:255.0.0.0
         UP LOOPBACK RUNNING MTU:16436 Metric:1
         RX packets:605 errors:0 dropped:0 overruns:0 frame:0
         TX packets:605 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:59008  (57.6 KiB)  TX bytes:59008  (57.6 KiB)

.

$ awk -v RS= -v ORS='\n\n' '$1~/^eth[0-9]+$/' file
eth0     Link encap:Ethernet HWaddr 00:11:22:33:44:55
         inet addr:192.168.0.50 Bcast: 192.168.0.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
         RX packets:1024 ERRORS:0 DROPPED:0 OVERRUNS:0 FRAME:0
         TX packets:2048 ERRORS:0 DROPPED:0 OVERRUNS:0 FRAME:0
         collisions:0 txqueuelen:1000
         RX bytes:6455319 (6.1 MiB)  TX bytes: 258478  (252.4 KiB)

If you only want to match interface numbers 0 to 100 just tweak it to:

awk -v RS= -v ORS='\n\n' '$1~/^eth([1-9]?[0-9]|100)$/'

edited Jul 14 '16 at 17:56

answered Jul 12 '16 at 21:39

Ed Morton

188,023
17
78
185

1

Help me understand what the '+$' here does? Does that stop matching immediately after the [0-9] ? I'm missing the concept of how this matches on a one digit field, or a two digit field, but not a one or two digit field followed by a colon ':' character. Thanks – Chris Jul 14 '16 at 14:23
Trying to add {1,2} does not match on any of the output. For example: ifconfig -a | awk -v RS= -v ORS='\n\n' '$1 ~ /^eth[0-9]{1,2}+$/' returns nothing I think this works and will match a numeric range of 0-99 awk -v RS= -v ORS='\n\n' '$1~/^eth[0-9]|[0-9][0-9]|100/ && $1 !~ /:/' However I'm back to using two expressions, and as @A.Danischewski pointed out, there may be a substantial IO penalty. – Chris Jul 14 '16 at 14:39
2

`+` means `1 or more repetitions of the preceding regexp element` and `$` means `end of string` so `[0-9]+$` means `1 or more digits immediately followed by the end of string` so it'll match any sequence of digits but will not match a digit followed by any other character, including `:`. – Ed Morton Jul 14 '16 at 17:42
1

Why are you trying to change it - it works as-is, right? `{1,2}` means `1 or 2 repetitions of the preceding regexp element`. `+` means `1 or more repetitions of the preceding regexp element`. I don't know what you think `{1,2}+` might mean or what the regexp engine will make of it so I'm not surprised you get no output using it. There is no IO penalty for using a compound condition (`/a/ && /b/`) vs a single regexp (`/a.*b|b.*a/`) as there is no difference in the IO, just in the condition being tested. idk but A.Danichewski was maybe talking about chains of sed commands but not about awk. – Ed Morton Jul 14 '16 at 17:46
If you're trying to restrict it to just numbers 0-100 I've added an example of that at the end of my answer. – Ed Morton Jul 14 '16 at 17:57
2

No change is necessary, only an increase in my own understanding of the +$, gained from your explanations above :) Thank you. – Chris Jul 19 '16 at 14:29

Using sed to match on multiple patterns with one expression, and delete until a blank line

2 Answers2

BSD/OSX