1

thanks very much for looking at my thread. I am looking to make a script that reads in a VERY LARGE list of domains, sees which ones resolve, and then store only the ones that resolved to another file.

I currently have this in a script:

nslookup < input.txt - 1.1.1.1 -port=53 2>&1 |
awk '
NR==FNR { list[NR] = $0; next }
/^Name:/                { ++numResults; state="found" }
/Non-existent domain/   { ++numResults; state="not found" }
/NXDOMAIN/              { ++numResults; state="not found" }
/No answer/             { ++numResults; state="not found" }
state == "found"        { print list[numResults]; state="" }
' input.txt - >> output.txt

I also tried an extra line:

/[Cc]an.t find/         { ++numResults; state="not found" }

But somehow the columns/rows aren't lining up. For example, adding in this line hides total_garbage.com from the output (total_garbage.com does not nslookup to a result that contains the words 'Can.t find' so I have no idea what's going on)

The problems are

1 It is not handling the 'Can't find'/'No answer' case (00038a.net is still printed)

2 It is not handling the 'NXDOMAIN' case (total_garbage.com is still printed)

3 It is not handling the 'Name' case (0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info is missing from output)

4 Lots of newlines are printed at the end (you can see the whitespace in my output)

Sample input to my script:

google.ca
comingsoon.brightside.com
00038a.net
0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info
total_garbage.com

Desired output of my script:

google.ca
comingsoon.brightside.com
0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info

Actual output:

google.ca
comingsoon.brightside.com
00038a.net
total_garbage.com








nslookup < input.txt

Server:     127.0.0.1
Address:    127.0.0.1#53

Non-authoritative answer:
Name:   google.ca
Address: 216.58.192.131
Server:     127.0.0.1
Address:    127.0.0.1#53

Non-authoritative answer:
comingsoon.brightside.com   canonical name = elb-brightside-17469.aptible.in.
Name:   elb-brightside-17469.aptible.in
Address: 54.86.171.167
Name:   elb-brightside-17469.aptible.in
Address: 54.174.154.102
Server:     127.0.0.1
Address:    127.0.0.1#53

Non-authoritative answer:
*** Can't find 00038a.net: No answer
Server:     127.0.0.1
Address:    127.0.0.1#53

Non-authoritative answer:
Name:   0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info
Address: 178.162.203.226
Name:   0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info
Address: 178.162.203.211
Name:   0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info
Address: 178.162.203.202
Name:   0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info
Address: 85.17.31.122
Name:   0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info
Address: 85.17.31.82
Name:   0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info
Address: 5.79.71.225
Name:   0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info
Address: 5.79.71.205
Name:   0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info
Address: 178.162.217.107
Server:     127.0.0.1
Address:    127.0.0.1#53

** server can't find total_garbage.com: NXDOMAIN
p1r473
  • 25
  • 5
  • 2
    Just like before - if you [edit] your question to include concise, testable sample input (presumably the problematic output from `nslookup`) and expected output for the script you want help writing (probably the `awk` script after that pipe from `nslookup`) **THEN** we can best help you do that. – Ed Morton Oct 05 '19 at 17:06
  • The first line in your shell script includes `awk 'END{ print NR }' /etc/pihole/gravity.cleaned | awk '{print $1}'` which means "run an awk script to print the number of lines in gravity.cleaned and then pipe that number to a second awk script to print the first blank-separated field from that number (which doesn't have blank-separated fiends since its a number)". – Ed Morton Oct 05 '19 at 17:11
  • Thanks I have changed it to ```END=$(awk 'END{ print NR }' /etc/pihole/gravity.cleaned)``` and put the inputs again. Thank you for your patience – p1r473 Oct 05 '19 at 17:13
  • Make sure you change it in your question, not just in some local file on your computer than no-one reading your question can see. – Ed Morton Oct 05 '19 at 17:14
  • All your scripts are testing for "Name" but there's no "Name" in the sample input you posted and there's no way your awk script is converting `comingsoon.brightside.com` to `elb-brightside-17469.aptible.in` given what you've posted so far. Please make this clear and simple for us to help you with. – Ed Morton Oct 05 '19 at 17:16
  • Hi Ed, Name is the output from the nslookup. ```nslookup comingsoon.brightside.com Server: 127.0.0.1 Address: 127.0.0.1#53 Non-authoritative answer: comingsoon.brightside.com canonical name = elb-brightside-17469.aptible.in. Name: elb-brightside-17469.aptible.in Address: 54.86.171.167 Name: elb-brightside-17469.aptible.in Address: 54.174.154.102``` I have edited this into the question. Thanks for your continued help – p1r473 Oct 05 '19 at 17:19
  • Thats what I asked you to post as the sample input for your awk command - the output from nslookup. You posted something else and it's not clear what that is or where it fits into your pipline of commands. Please edit your command to simply show us concise, testable sample input and expected output for the awk command you need help writing. – Ed Morton Oct 05 '19 at 17:37
  • Oh I thought you meant input into the script. I have edited the question to show the output from nslookup. – p1r473 Oct 05 '19 at 17:49
  • Again, please tidy up your question to simply show us concise, testable sample input and expected output for the awk command you need help writing. Get rid of all that irrelevant stuff that's cluttering it up and making it hard to figure out what it is you need help with. – Ed Morton Oct 05 '19 at 17:58
  • Hi @EdMorton , I have tidied it up a bit. Thank you for your support and patience. – p1r473 Oct 05 '19 at 18:53
  • You're asking for help writing an awk script to parse the output of `nslookup < input.txt` so when I'm asking for you to post the `nslookup` output, I'm not asking to see the output of running `nslookup` on the individual domains that input.txt contains one at a time, I'm simply asking for you to please post the output of `nslookup < input.txt` (with whatever nslookup options you like) as **THAT** is the input for the awk script you want help to write and the text, spacing, order, etc. will not be exactly the same as when you run it one domain at a time. – Ed Morton Oct 05 '19 at 20:30
  • Thanks very much for clarifying. I've added it to the original post (at the bottom). Thank you so much for your help. – p1r473 Oct 05 '19 at 20:53
  • OK, see my updated answer. – Ed Morton Oct 05 '19 at 21:07

1 Answers1

0

Is this what you're trying to do (using cat nslookup.out | for testing with your provided sample rather than running nslookup ... | locally which would produce different output than you want the awk script to parse)?

$ cat tst.sh
#!/bin/env bash

#nslookup < input.txt 2>&1 |
cat nslookup.out |
awk '
NR==FNR { list[NR] = $0; next }
/^Name:/                { state="found" }
/[Cc]an\047t find/      { state="not found" }
!NF && (state != "") {
    ++numResults
    if ( state == "found" ) {
        print list[numResults]
    }
    state=""
}
' input.txt -

$ ./tst.sh
google.ca
comingsoon.brightside.com
0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info

Past attempts:

$ cat gravity.list
comingsoon.brightside.com
total_garbage.com
google.com

$ cat tst.sh
#!/bin/env bash

nslookup < gravity.list 2>&1 |
awk '
NR==FNR { list[NR] = $0; next }
/^Name:/                { result = $NF }
/Non-existent domain/   { result = "not found" }
result != "" { print list[++numResults], "->", result; result="" }
' gravity.list -

$ ./tst.sh
comingsoon.brightside.com -> elb-brightside-17469.aptible.in
total_garbage.com -> not found
google.com -> google.com

or this?

$ cat tst.sh
#!/bin/env bash

nslookup < gravity.list 2>&1 |
awk '
NR==FNR { list[NR] = $0; next }
/^Name:/                { ++numResults; state="found" }
/Non-existent domain/   { ++numResults; state="not found" }
state == "found" { print list[numResults]; state="" }
' gravity.list -

$ ./tst.sh
comingsoon.brightside.com
google.com
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Hi, thank you for your help. Not quite, I am looking for: `./tst.sh google.com` I want to only store the domains that successfully resolve – p1r473 Oct 05 '19 at 17:44
  • Sorry, that output should say: `./tst.sh comingsoon.brightside.com google.com` I want to only store the domains that successfully resolve. I also wish to only store what I searched through nslookup (comingsoon.brightside.com) and not the output (elb-brightside-17469.aptible.in) – p1r473 Oct 05 '19 at 17:50
  • OK, I added a slightly modified version. Is that it? – Ed Morton Oct 05 '19 at 17:56
  • 1
    Thank you sir!!!!! You nailed it with that last one. 2 questions- 1. where are the empty new lines coming from at the end of the output? 2. Im going to test this now, but if I feed this in a 70 megabyte file that takes days, is this running all in ram at once? How will my poor little Raspberry Pi handle this? – p1r473 Oct 05 '19 at 17:59
  • 1. There are no empty newlines at the end, not sure what you're getting at there. 2. I've no idea what nslookup will do but the awk script will handle the nslookup output in the blink of an eye and will store the list of domains in memory. – Ed Morton Oct 05 '19 at 18:02
  • I seem to have a lot of new lines printed at the end of the output but I can always remove those myself with a sed command Okay last question for you sir, there is one additional case not caught. nslookup 00038a.net produces a "No answer" response which I am trying to capture with `/No answer/ { ++numResults; state="not found" }` but this line doesn't seem to catch it. I have also tried `/Can.t find/ { ++numResults; state="not found" }` – p1r473 Oct 05 '19 at 18:06
  • Actually it looks like the second answer is still printing the total_garbage.com domain too. – p1r473 Oct 05 '19 at 18:14
  • Sounds like your nslookup output contains values that my script is not designed to handle so once again - edit your question to simply show us concise, testable sample input and expected output for the awk command you need help writing. Include the nslookup output that is not currently handled by my script and so is causing you problems now. Get rid of all that irrelevant stuff that's cluttering up your question and making it hard to figure out what it is you need help with. – Ed Morton Oct 05 '19 at 18:16
  • Thanks Ed, I am working on editing the original post now. – p1r473 Oct 05 '19 at 18:26
  • @DavidC.Rankin it's been a process... :-). – Ed Morton Oct 05 '19 at 21:19
  • @EdMorton thank you so much. Parsing works great. Let me see what happens when I feed it my 3.7 million queries! That's one thing that worries me rather than running line by line... – p1r473 Oct 05 '19 at 21:36
  • 1
    @EdMorton its working for the first 500 queries so far. Going to let it run for a while. Do you accept donations? – p1r473 Oct 05 '19 at 21:58
  • 1
    Thanks for the offer but no, just help the next person. All the best! – Ed Morton Oct 05 '19 at 22:13