1

Backstory: I am making a script to check a blacklist of domains and see which domains are still valid (resolve to an IP) so I can cut out the old non-resolving domains from the list. The list has millions of lines, so I am using awk (instead of a "do while read") to increase speed.

I am trying to write an awk statement that will nslookup a list of domains and print only the list of resolve-able domains to another list.

I am almost finished except I am stuck on one part- how can I specify the server that nslookup uses?

I have -port=54 working, but I am also trying to configure which DNS server nslookup uses.

awk '{print $1}' /etc/pihole/gravity.list | nslookup -port=54| awk '/[Nn]ame/ {print $NF}'  >> /etc/pihole/gravityProcessed.list

If I try to specify -server= this is not a valid parameter If I try to change the nslookup to use 1.1.1.1, instead of using 1.1.1.1 as the server, it tries to nslookup 1.1.1.1 instead.

awk '{print $1}' /etc/pihole/gravity.list | nslookup 1.1.1.1 | awk '/[Nn]ame/ {print $NF}'  >> /etc/pihole/gravityProcessed.list

The issue is that nslookup doesnt have a -server parameter afaik (yes it has a -port parameter) So I need awk to do:

nslookup [INSERT HOST] server -port=

Here is a sample of /etc/pihole/gravity.list

google.com
yahoo.com
skype.com
microsoft.com

The other option I wish to try to incorporate is a regex of a proper domain syntax as the script currently dies if it hits a domain that isnt formatted properly. Like putting this through a grep (?=^.{4,253}$)(^(?:[a-zA-Z0-9](?:(?:[a-zA-Z0-9\-]){0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$)

p1r473
  • 25
  • 5
  • Why are you using `awk` at the beginning of the pipeline when you could be using `cat`? – Spencer Oct 03 '19 at 20:08
  • The file I am trying to parse is really big- 4 million lines. According to this post, awk will parse it faster than cat. https://askubuntu.com/questions/564944/cat-vs-grep-vs-awk-command-get-the-file-content-which-one-is-more-efficient-and I had tried reading line by line with "While read" and it was super slow – p1r473 Oct 03 '19 at 20:22
  • Not sure what you want. Something like `nslookup $(awk '{print $1}' /etc/pihole/gravity.list) -port=54` (only when `awk`returns one result) or `awk '{print $1}' /etc/pihole/gravity.list | xargs -L1 -I{} nslookup {} -port=54` ? – Walter A Oct 03 '19 at 20:27
  • @Spencer I doubt if they should be using `cat` (google UUOC) - if nslookup needs whole lines from gravity.list then either nslookup can read a file on it's own or input can be reidrected with `<`. p1* - see [why-is-using-a-shell-loop-to-process-text-considered-bad-practice](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice) for why a while read is super slow. – Ed Morton Oct 03 '19 at 20:29
  • Are you trying to call nslookup 4 million times, and parse the results also 4 million times? Can you reduce the number of lookups with `cut -f1 /etc/pihole/gravity.list | sort -u` (perhaps add `-d " "` when fields are seperated by spaces) ? – Walter A Oct 03 '19 at 20:31
  • 2
    @p1r473 if you post a small sample of gravity.list and what you need to extract from it to pass to nslookup then we can help with that part and similarly if you post the output from nslookup and what you want to print given that then we can help with that part too. – Ed Morton Oct 03 '19 at 20:34
  • `nslookup` doesn't get the name to look for from standard input, why are you piping to `nslookup`? – Barmar Oct 03 '19 at 20:57
  • @Walter Yes, I am trying to call nslookup 4 million times. The PiHole application takes many hostsfiles and blocklists available online and merges them into 1 mega one. However, many of these lists are unmaintained. I am trying to get rid of the hostnames that no longer resolve to an IP. – p1r473 Oct 03 '19 at 21:15
  • @Barmar I just simply need to get a sublist of hostnames that resolve to an IP. – p1r473 Oct 03 '19 at 21:16
  • The other option I wish to try to incorporate is a regex of a proper domain syntax as the script currently dies if it hits a domain that isnt formatted properly. Like putting this through a grep (?=^.{4,253}$)(^(?:[a-zA-Z0-9](?:(?:[a-zA-Z0-9\-]){0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$) – p1r473 Oct 03 '19 at 21:19
  • @Walter, I am not familiar with cut. Where in the awk would I put cut -f1 /etc/pihole/gravity.list | sort -u – p1r473 Oct 03 '19 at 21:19
  • @EdMorton gravity.list is simply a list of domains. 1 domain per line. So google.com, google.ca, yahoo.com, etc. – p1r473 Oct 03 '19 at 21:20
  • Great, include a sample of it in your question. – Ed Morton Oct 03 '19 at 21:22
  • @p1r473 use `cut` from the prompt. First try `echo "this is an example"| cut -d" " -f2`, play with it and then try `head -100 /etc/pihole/gravity.list | cut -f1 -d" " | sort -u`. With `head` you can test with a subset. – Walter A Oct 03 '19 at 21:25
  • Isn't what you want `xargs`? Read the output of a command and put it as a argument to another command? ` – KamilCuk Oct 03 '19 at 21:27
  • @Walter I tried `cut -f1 /etc/pihole/gravity.list | sort -u` and it seemed to just spit out the whole list. Are you trying to remove duplicates, as the list is already deduplicated. – p1r473 Oct 03 '19 at 21:29
  • Can't nslookup simply read input for itself though? I posted an "answer" to try to move the question along a bit faster. – Ed Morton Oct 03 '19 at 21:38
  • @Kamil I cant find a way to make xargs append a server parameter, like "1.1.1.1" and pass it to the nslookup – p1r473 Oct 03 '19 at 21:42
  • Other utilities could be easier to parse ex, `host` ` – KamilCuk Oct 03 '19 at 21:42
  • Unfortunately host cannot specify the -port parameter and I am trying to resolve with a local resolver listening on -port=54. This is why I switched to nslookup from host. The parsing is working, its trying to send an extra parameter e.g. 1.1.1.1, a resolver, to the awk – p1r473 Oct 03 '19 at 21:44

1 Answers1

2

When you want to use an alternative nameserver, look at the following instruction for nslookup:

ARGUMENTS
Interactive mode is entered in the following cases:
1. when no arguments are given (the default name server will be used)
2. when the first argument is a hyphen (-) and the second argument is the host name or Internet address of a name server.

So try

nslookup - 1.1.1.1 < /etc/pihole/gravity.list 2>/dev/null | 
   awk '/[Nn]ame/ {print $NF}' >> /etc/pihole/gravityProcessed.list
Walter A
  • 19,067
  • 2
  • 23
  • 43
  • Amazing, thank you. This worked perfectly. I just forgot the hyphen. Final line is: awk '{print $1}' /etc/pihole/gravity.list |nslookup - 127.0.0.1 -port=54 < /etc/pihole/gravity.list 2>/dev/null | awk '/[Nn]ame/ {print $NF}' >> /etc/pihole/gravityProcessed.list Walter, I am new here, but for my question # 2, which is passing this through a regex to verify the domain is valid, should that be a new question? – p1r473 Oct 03 '19 at 21:55
  • It should be a new question. First try the simplified `sed -rn '/^(([a-zA-Z0-9\-]){0,61}[a-zA-Z0-9]?\.)+[a-zA-Z]{2,}$/p' /etc/pihole/gravity.list | nslookup - 1.1.1.1 2>/dev/null | awk '/[Nn]ame/ {print $NF}' >> /etc/pihole/gravityProcessed.list`, perhaps you can work this out. – Walter A Oct 03 '19 at 22:00
  • @p1r473 what do you think `awk '{print $1}' /etc/pihole/gravity.list |nslookup - 127.0.0.1 -port=54 < /etc/pihole/gravity.list` means wrt where `nslookup` is getting it's input from? It's important that you understand this. – Ed Morton Oct 03 '19 at 22:06
  • Forget good/bad domains - the point is **you** in the command you posted that I copied into my comment, are piping the output of an awk command to the stdin of nslookup (`awk '{print $1}' /etc/pihole/gravity.list |nslookup ...`) and at the same time redirecting the contents of gravity.list to the stdin of nslookup (`nslookup ... < /etc/pihole/gravity.list`). So where do you intend nslookup to actually read it's stdin from? I just want to make sure you understand what those commands you were using are doing so you can see that it doesn't make sense to both pipe into and redirect into a command. – Ed Morton Oct 03 '19 at 22:34
  • 1
    wrt bad domains - if that regexp you posted does what you want then you can remove the input redirection and do `grep -P 'that regexp' | nslookup - 1.1.1.1 2>/dev/null | awk '...'` instead. – Ed Morton Oct 03 '19 at 22:41