-1

I have a log file of 10 hits e.g. one line is:

127.0.0.1 - - [10/Oct/2007:13:55:36 ­0700]"GET /index.html HTTP/1.0" 200 2326 "http://www.example.com/links.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

The format of each line is the same i.e. the IP address is always at the start.

I've currently read in the file using fopen and fgets but now I want to count how many unique IPs there are in the file, as well as count how many times an IP 'hits'. Not sure how I would attempt this.. Any tips on how I would go about doing this?

hturner
  • 215
  • 3
  • 7
  • 19
  • 1
    It depends if the file format is fixed and ou know where to look for the IP addresses or if you need to scan the lines for patterns that *look* like IP addresses. This would be approximate since URLs may contain such patterns producing false positives. – chqrlie Nov 24 '15 at 23:03
  • If the IP address is always at the beginning, it is relatively easy to parse them with `sscanf` after checking for proper format. – chqrlie Nov 24 '15 at 23:05
  • " e.g. **one** line is:" **4** lines of data? – chux - Reinstate Monica Nov 24 '15 at 23:10
  • @chux Sorry, that's the way it copied in. All the data is supposed to be on one line – hturner Nov 25 '15 at 07:53
  • @chqrlie The IP is always at the start so I will have a look into sscanf, thanks – hturner Nov 25 '15 at 07:53

1 Answers1

2

Code can march though the file looking for the ddd.ddd.ddd.ddd pattern.

Avoided use "%d" or "%u" as they accept leading spaces, and '-' and '+'.

Pseudo code

Read from a file until EOF found
  repeatedly look for a digit
  if it is found
    note position
    put digit back into stream
    look for ddd.ddd.ddd.ddd
    if found
      decode (and test for values > 255)
      if successful return result
    go back to position

return fail value;

Sample code. Should also have IO error checking.

unsigned long Parse_IP(FILE *inf) {
  int ch;
  for ((ch = fgetc(inf)) != EOF) {
    if (isdigit(ch)) {
      long pos = ftell(inf);
      ungetc(ch, inf);
      char buf[4][4];
      int count = fscanf(inf, "%3[0-9].%3[0-9].%3[0-9].%3[0-9]", 
          buf[0], buf[1], buf[2], buf[3]);
      if (count == 4) {
        unsigned long ip = 0;  
        int i;
        for (i=0; i<4; i++) {
          int digit = atoi(buf[i]);
          if (digit > 255) break;
          ip = ip*256 + digit;
        }
        if (i == 4) return ip;  
      }
      fseek(inf, pos, SEEK_SET);
    }
  }
  return 0;
}

Sample usage

unsigned long ip;
while ((ip = Parse_IP(inf)) != 0) {
  printf("ip %08lX\n", ip);
}
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Your code may fail for pipe or console input as you assume you can `fseek` backwards, and it will mistakenly match IP addresses for these patterns: `9127.0.0.1`, `0.0.0.2550` etc. – chqrlie Nov 24 '15 at 23:49
  • I'm being picky, but why not accept `0.0.0.0` as a valid IP address pattern? – chqrlie Nov 24 '15 at 23:57
  • 1
    @chqrlie The method posted returns a 32+ bit integer using _some_ value as a failure indication. Code may easily be amended to return an `long long` with -1 as the bad boy or save the IP address in a passed pointer location and return a `int` or 0 or 1. `0` seemed a natural choice for "invalid value". No compelling reason for `0` – chux - Reinstate Monica Nov 25 '15 at 01:58
  • Vintage @chux - logical, well validated, and to the point! (all clear this time `:)` – David C. Rankin Nov 25 '15 at 09:34