I'm using awk to analyze some access log files. I'm currently using the following:
awk '($9 ~ /404/)' access_log | awk '{print $9,$7}' | sort | uniq -c | sort > 404.txt
Which returns all of the 404s in my access log with number of appearances. However, it returns absolutely everything—but I'm only interested in html pages.
How can I modify this to only return values for requests that end in .html?