So I have this IP blocking script, where i have 3 sources to get both IP addresses and address ranges from, get them all together into a single file, then import them with ipset and iptables with some code "borrowed" from OpenWrt's BanIP. It uses Awk to filter them through a regex, to get a clean output, then add some text before each ip address to create an iptables compatible hash.
# download sources
####################################################################################################
# manually added ips
##################################################
cat <<IPS > /tmp/ips.txt
#
# ip address 1
198.54.xxx.xxx
#
IPS
# sources from links (main)
##################################################
cat <<IPS2 | sed 's/#.*$//g' | tr "\n" " " | xargs curl --retry 5 -s >> /tmp/ips.txt && echo "Downloaded IP lists 1/2"
#
# DoH providers used to circumvent hosts blocking
https://raw.githubusercontent.com/dibdot/DoH-IP-blocklists/master/doh-ipv{4,6}.txt
# Full bogons
https://www.team-cymru.org/Services/Bogons/fullbogons-ipv{4,6}.txt
# Firehol level 1 and 2
https://iplists.firehol.org/files/firehol_level{1,2}.netset
# country ip allocations
https://stat.ripe.net/data/country-resource-list/data.json?resource=ng
# bots/spammers recent
https://iplists.firehol.org/files/firehol_abusers_1d.netset
#
IPS2
# sources from whois
##################################################
{
#
# social login providers
# facebook ip class
whois -h whois.radb.net -- '-i origin AS32934'
# twitter ip class
whois -h whois.radb.net -- '-i origin AS13414'
# apple ip class
whois -h whois.radb.net -- '-i origin AS714'
#
} | grep route >> /tmp/ips.txt && echo "Downloaded IP lists 2/2"
# create blocklists, awk arguments are taken from OpenWrt's BanIP
####################################################################################################
awk 'NR==1{print "create ipmaster hash:net family inet hashsize 750000 maxelem 1000000"}''/^(([0-9]{1,3}\.){3}[0-9]{1,3}(\/[0-9]{1,2})?)([[:space:]]|$)/{print "add ipmaster "$1}' /tmp/ips.txt | ipset restore -!
awk 'NR==1{print "create ipmaster6 hash:net family inet6 hashsize 750000 maxelem 1000000"}''/^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}(:\/[0-9]{1,2})?([[:space:]]|$)/{print "add ipmaster6 "$1}' /tmp/ips.txt | ipset restore -!
# import blocklists
iptables -v -I INPUT -m set --match-set ipmaster src -j DROP
ip6tables -v -I INPUT -m set --match-set ipmaster6 src -j DROP
echo ""$(ipset list ipmaster | wc -l)" blocked IPv4 IPs"
echo ""$(ipset list ipmaster6 | wc -l)" blocked IPv6 IPs"
# clean up
rm /tmp/ips.txt
The problem however is that the awk regex filter is not "catching" some ip addresses, or in some cases deleting the / address space termination thing.
For instance, it omits these fields altogether
"2001:4270::/32",
And anything else within a quote etc.
Do you suggest i correct the Awk with a better regex, and if so what should i use? I have exactly zero knowledge of regex rules
I do not need the ip addresses to be validated, iptables can take errors with no problem and i think it validates them anyway, so I actually need a regex that is extremely simple, as simple as possible.
Or should i use an external python library or perl? The script will have access to both.