removing hosts from a comma delimited file

Question

I am trying to script a way of removing hosts from the hostgroup file in Nagios Core. The format of the hostgroup file is:

server1,server2,server3,server4

When removing a server, I need to be able to not only remove the server, but also the comma that follows it. So in my example above, if I am removing server2, the file would result as follows

server1,server3,server4

So I have googled and tested the following which works to remove server2 and a comma after it (I don't know what the b is used for exactly)

sed -i 's/\bserver2\b,//g' myfile

What I want to be able to do is to feed a list of hostnames to a small script to remove a bunch of hosts (and their following comma) with something similar to the following. The problem lies in that placing a variable like $x breaks the script so that nothing happens.

#!/bin/ksh
for x in `cat /tmp/list`
do
sed -i 's/\b${x}\b,//g' myfile
done

I think I am very close on a solution here, but could use a little help. Thanks much in advance for your kind assistance.

Not related to your problem with ${x}, but "for x in `cat /tmp/list`" works, but preferred is "while read -r x ... done < /tmp/list" - avoids spawning another process as well as avoids useless use of cat — Ian McGowan, Apr 21 '20 at 03:36
More on reading a file a line at a time in bash: https://mywiki.wooledge.org/BashFAQ/001 — Shawn, Apr 21 '20 at 07:29

Ian McGowan · Answer 1 · 2020-04-21T03:37:56.507

Using single quotes tells the shell not to replace the ${x} - it turns off variable interpolation if you want to google for it. https://www.tldp.org/LDP/abs/html/quotingvar.html. So use double quotes around the sed replacement string instead:

while read -r x; do sed -i "s/\b${x},\b//g" myfile; done < /tmp/list

But since the last field won't have a comma after it, might be a good idea to run two sed commands, one looking for \bword,\b and the other for ,word$ - where \b is a word boundary and $ is the end of line.

while read -r x; do sed -i "s/\b${x},\b//g" myfile; sed -i "s/,${x}$//" myfile ; done < /tmp/list

One other possible boundary condition - what if you have just server2 on a line by itself and that's what you're trying to delete? Perhaps add a third sed, but this one will leave a blank line behind which you might want to remove:

while read -r x
do
  sed -i "s/\b${x},\b//g" myfile  # find and delete word,
  sed -i "s/,${x}$//" myfile      # find and delete ,word
  sed -i "s/^${x}$//" myfile      # find word on a line by itself
done < t

score 2 · Answer 2 · answered Apr 21 '20 at 09:54

This works quite nicely:

#!/bin/bash
IN_FILE=$1
shift; sed -i "s/\bserver[$@],*\b//g" $IN_FILE; sed -i "s/,$//g" $IN_FILE

if you invoke it like ./remove_server.sh myfile "1 4" for your example file containing server1,server2,server3,server4, you get the following output:

server2,server3

A quick explanation of what it does:

shift shifts the arguments down by one (making sure that "myfile" isn't fed into the regex)
First sed removes the server with the numbers supplied as arguments in the string (e.g. "1 4")
Second sed looks for a trailing comma and removes it
The \b matches a word boundary

This is a great resource for learning about and testing regex: https://regex101.com/r/FxmjO5/1. I would recommend you check it out and use it each time you have a regex problem. It's helped me on so many occasions!

An example of this script working in a more general sense:

I tried it out on this file:

# This is some file containing server info:
# Here are some servers:
server2,server3

# And here are more servers:
server7,server9

with ./remove_server.sh myfile "2 9" and got this:

# This is some file containing info:
# Here are some servers:
server3

# And here are more servers:
server7

Jetchisel · Answer 3 · 2020-04-22T02:06:48.520

1

Pretty sure there is a pure sed solution for this but here is a script.

#!/usr/bin/env bash

hosts=()

while read -r host; do
  hosts+=("s/\b$host,\{,1\}\b//g")
done < /tmp/list

opt=$(IFS=';' ; printf '%s' "${hosts[*]};s/,$//")

sed "$opt" myfile

It does not run sed line-by-line, but only one sed invocation. Just in case, say you have to remove 20+ pattern then sed will not run 20+ times too.
Add the -i if you think the output is ok.

edited Apr 22 '20 at 02:06

answered Apr 21 '20 at 03:18

Jetchisel

7,493
2
19
18

1

Nice! It's a good point about multiple invocations - if you have a delete file with hundreds of entries and a host file with thousands, the naïve solution I gave will be much slower than this answer... – Ian McGowan Apr 21 '20 at 03:41

James Brown · Answer 4 · 2020-04-21T04:14:55.657

Using perl and regex by setting the servers to a regex group in a shell variable:

$ remove="(server1|server4)"
$ perl -p -e  "s/(^|,)$remove(?=(,|$))//g;s/^,//" file
server2,server3

Explained:

remove="(server1|server4)" or "server1" or even "server."
"s/(^|,)$remove(?=(,|$))//g" double-quoted to allow shell vars, remove leading comma, expected to be followed by a comma or the end of string
s/^,// file remove leading comma if the first entry was deleted

Use the -i switch for infile editing.

Shawn · Answer 5 · 2020-04-21T07:50:59.720

bash script that reads the servers to remove from standard input, one per line, and uses perl to remove them from the hostfile (Passed as the first argument to the script):

#!/usr/bin/env bash
# Usage: removehost.sh hostgroupfile < listfile

mapfile -t -u 0 servers
IFS="|"
export removals="${servers[*]}"
perl -pi -e 's/,?(?:$ENV{removals})\b//g; s/^,//' "$1"

It reads the servers to remove into an array, joins that into a pipe-separated string, and then uses that in the perl regular expression to remove all the servers in a single pass through the file. Slashes and other funky characters (As long as they're not RE metacharacters) won't mess up the parsing of the perl, because it uses the environment variable instead of embedding the string directly. It also uses a word boundry so that removing server2 won't remove that part of server22.

removing hosts from a comma delimited file

5 Answers5