How parse the log file use python , and output html?

Question

Here is one line of log file :

41.42.50.xxx - - [09/Oct/2012:00:00:01 +0200] "GET http://www.xxxxxx.com/solutions-ar/solutions-1466.php HTTP/1.1" 200 10 "http://www.google.com.eg/url?dfasdfeaefdf" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4"

i want to parse the ip address, time, url, google url and browser to single line, i use (r'^(((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?))') to match the ip address, how can i get the other info and output html ? Thanks

That regex is really overly paranoid...`^(?:\d+\.){3}\d+` should be fine. — nneonneo, Oct 09 '12 at 05:58
Thanks, i'll improve it, but how can i parse the other info rest of the line ? — Hisone Nightmare, Oct 09 '12 at 06:01
Do you have any other examples of log lines? Would help with things like date, etc.--will the month always be three letters? Will anything else ever be inside `[]`? — jdotjdot, Oct 09 '12 at 06:02
@jdotjdot89 : yes, everyline's structure is the same, the time ever be inside [] — Hisone Nightmare, Oct 09 '12 at 06:06

score 3 · Answer 1 · answered Oct 09 '12 at 06:02

3

Use a library like apachelog to parse the Apache log lines. It will be more robust and safer than trying to write a regex for the lines.

answered Oct 09 '12 at 06:02

nneonneo

171,345
36
312
383

score 2 · Accepted Answer · answered Oct 09 '12 at 06:10

2

IP Address: r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
Time: r'\d{2}/[a-zA-Z]{3}/\d{4}:\d{2}:\d{2}:\d{2} \+\d{4}'
Time (alternate): r'(?<=\[).+?(?=\])', lazy, assuming date will always be inside [] and only date will ever be inside []
URL: r'https?://.+?(?= HTTP)'
Google URL: r'(?<=")https?://.*?google\..*?(?=")'
Browser: r'(?<=")Mozilla.+?(?=")'

However, as nneonneo pointed out, using a tool like apachelog will be a lot more robust and reliable.

answered Oct 09 '12 at 06:10

jdotjdot

16,134
13
66
118

Thank you very much~ by the way, i parse the url from resources without google, and like bing, yahoo search engin and so on, i want to parse the keyword behind ?q= or q=, how can i match the keyword ? i use (?<=q=)|(? – Hisone Nightmare Oct 11 '12 at 02:45
For a new question, it's best if you open up a new Stack Overflow question. I'd be happy to answer it there. Feel free to message me the link when you've done so so I can answer it. – jdotjdot Oct 11 '12 at 02:51
http://stackoverflow.com/questions/12831537/parse-and-match-the-keyword-in-search-engine-url-use-python-re Thank you again : ) – Hisone Nightmare Oct 11 '12 at 03:02

How parse the log file use python , and output html?

2 Answers2