3

I have a large file full of lines like this...

19:54:05 10.10.8.5 [SERVER] Response sent: www.example.com. type A by 192.168.4.5
19:55:10 10.10.8.5 [SERVER] Response sent: ns1.example.com. type A by 192.168.4.5
19:55:23 10.10.8.5 [SERVER] Response sent: ns1.example.com. type A by 192.168.4.5

I don't care about any of the other data, only what's after the "response sent:" I'd like a sorted list of the most common occurrences of the domain-names. Problem is I won't know all the domain-names in advance, so I can't just do a search for the string.

Using the example above I'd like the output to be along the lines of

ns1.example.com (2)
www.example.com (1)

...where the number in ( ) is the counts of that occurrence.

How/what could I use to do this on Windows? The input file is .txt - the output file can be anything. Ideally a command-line process, but I'm really lost so I'd be happy with anything.

Cœur
  • 37,241
  • 25
  • 195
  • 267
notAduck
  • 190
  • 1
  • 3
  • 13
  • 1
    I _really_ want to help you but I fear your question is not up to the standards of SO. This reads like a gimmi the codez question. Is there _something_ you tried? – Matt Apr 21 '15 at 01:19
  • I've tried a few variations of grep command-line options based on other questions I've found on stackoverflow - but honestly coding is not my day job, I was just given this after someone quit - what I've found always wants an input search string (to be known) - so I tried taking the file above, sorting it in excel to remove the extra stuff, then running grep across that for each of the domains, but it's way too slow/manual - I figured there has to be a better way. – notAduck Apr 21 '15 at 01:24

4 Answers4

3

Cat is kinda out of the bag so lets try and help a little. This is a PowerShell solution. If you are having issues with how this works I encourage you to research the individual parts.

If you text file was "D:\temp\test.txt" then you could do something like this.

$results = Select-String -Path D:\temp\test.txt -Pattern "(?<=sent: ).+(?= type)" | Select -Expand Matches | Select -Expand Value
$results | Group-Object | Select-Object Name,Count | Sort-Object Count -Descending

Using your input you would get this for output

Name             Count
----             -----
ns1.example.com.     2
www.example.com.     1

Since there is regex I have saved a link that explains how it works.

Please keep in mind that SO is, of course, a site that helps programmers and programming enthusiasts. We are devoting our free time where as some people get paid to do this.

Matt
  • 45,022
  • 8
  • 78
  • 119
  • Apparently I'm too young on this site to upvote, but I'd give you all the upvotes I could. This really helps me out. I'll admit don't quite understand the nuances of stackoverflow, so I apologize if this was out of line - but I'm in a pinch to solve a DoS problem and this helps immensely. Thanks again! – notAduck Apr 21 '15 at 02:00
  • It's OK. Research effort and sample code can go a long way here. Perhaps we shall meet again later. You're welcome – Matt Apr 21 '15 at 02:01
2

Can you do it in PHP?

<?php
$lines = file($filename, FILE_IGNORE_NEW_LINES);

foreach($lines as $value) {
   $arr = explode(' ', $value);
   $domainarr[] = $arr[5];
}

$occurence = array_count_values($domainarr);

print_r($occurence);
?>
Aiken
  • 272
  • 1
  • 8
  • Hey Aiken. While this might answer the broad question it's not even one of the languages tagged. – Matt Apr 21 '15 at 01:23
  • Hi Matt, tnx for the feedback. OP stated he's happy with anything, and this seemed the best way imho. I'm fairly new to the site, shouldn't I have posted it? – Aiken Apr 21 '15 at 01:27
  • I'm appreciative regardless - but stupid question time: I'm pretty sure I can setup a php server for this, but I kind of thought that was an exclusively web based language, so do I just put the input.txt file in the same directory as this .php file and reference it (how?), then when I hit the .php page it will process and output the results in the browser? I would be more intelligent about this if it were .asp or .bat or .ps1 script. – notAduck Apr 21 '15 at 01:31
  • Indeed, just put them together on a server that has PHP installed, run the file in which you have the PHP-script in the browser, and it will show you what you want. – Aiken Apr 21 '15 at 01:36
2

This is in batch:

@echo off
setlocal enabledelayedexpansion
if exist temp.txt del temp.txt
for /f "tokens=6" %%a in (input.txt) do (Echo %%a >> temp.txt)
for /f %%a in (temp.txt) do (
set /a count=0
set v=%%a
if "!%%a!" EQU "" (
for /f %%b in ('findstr /L "%%a" "temp.txt"') do set /a count+=1
set %%a=count
Echo !v:~0,-1! ^(!count!^)
)
)
del temp.txt

Currently it prints it out to the screen. If you would like to redirect it to a text file replace:

Echo !v:~0,-1! ^(!count!^)

with:

Echo !v:~0,-1! ^(!count!^) >> output.txt

This outputted:

www.example.com (1)
ns1.example.com (2)

With the sample data

Monacraft
  • 6,510
  • 2
  • 17
  • 29
  • Thanks for this example - it works pretty well, although the output isn't sorted by greatest number of instances first, I can live with that. – notAduck Apr 21 '15 at 01:57
2

This Batch file solution should run faster:

@echo off
setlocal

rem Accumulate each occurance in its corresponding array element
for /F "tokens=6" %%a in (input.txt) do set /A "count[%%a]+=1"

rem Show the result
for /F "tokens=2,3 delims=[]=" %%a in ('set count[') do echo %%a (%%b)

Output:

ns1.example.com. (2)
www.example.com. (1)

To store the result in a file, change the last line by this one:

(for /F "tokens=2,3 delims=[]=" %%a in ('set count[') do echo %%a (%%b^)) > output.txt
Aacini
  • 65,180
  • 12
  • 72
  • 108