3

in class, we were assigned to parse through an access log and record all the successful attempts. In researching the Apache web server access log I ran into several instances, roughly 3/100,000 logs that the format was incorrect. The log would appear as, for example, 96.45.3.2 - - [14/Mar/2011:00:12:33 -0400] "GET webpage.html HTTP/1.1" 400 236 - -

I am not asking on how to parse through the file, I'm just curious as to what happens if the format isn't finished? Did the user's browser fail? Power outage? ect.

Also, when parsing through the file, I noticed that in one specific instance the index [8] (which is supposed to be the 200,300,400,500 codes) there was a mistake where the index[8] would be 1.1".

Any ideas?

kapa
  • 77,694
  • 21
  • 158
  • 175
Seth Kania
  • 251
  • 3
  • 15

1 Answers1

0

The '-' just indicate that the field information is not available (http://httpd.apache.org/docs/2.2/logs.html). Those last two hyphens just mean that whatever information is supposed to be logged was not there when the message was issued.

The second part about index[8] is that you are making assumptions about how the log line is formatted. I would bet that you are splitting on spaces and there is a space earlier in the line.

Lance Helsten
  • 9,457
  • 3
  • 16
  • 16
  • The second is probably an invalid request where someone sent a request like `GET webpage.html HTTP 1.1` – Swiss Mar 12 '12 at 21:53
  • So the problem is that someone sent `"GET webpage.html HTTP/ 1.1"`, a space where there shouldn't be one? – Seth Kania Mar 12 '12 at 21:56
  • @Seth, no the problem is that you should be treating the part between the quotes as a single field and not splitting on the spaces in there at all – John La Rooy Mar 12 '12 at 22:41
  • The "GET xxxxxxx" is whatever was issued at host:80 as the http request. We formed requests use %20 instead of a space for URIs but if a client *did* use a space then this would cause the offset that you describe. That's why this field is quoted. – TerryE Mar 12 '12 at 23:16