I have been getting a lot of web hits in my logs that crawl most top level pages of my site and show a referrer as a Java version.
I see different variants of the Java versions in the referrer, i.e. Java/1.6.0_04, Java/1.4.1_04, Java/1.7.0_25, etc.
And sometimes, but not always, I get a 404 for /contact/ but none of the other pages below.
The IPs are usually always spam harvesters and bots, according to Project Honeypot
78.129.252.190 - - [24/Jan/2014:01:28:52 -0800] "GET / HTTP/1.1" 200 6728 "-" "Java/1.6.0_04" 198 7082
78.129.252.190 - - [24/Jan/2014:01:28:55 -0800] "GET /about HTTP/1.1" 301 - "-" "Java/1.6.0_04" 203 352
78.129.252.190 - - [24/Jan/2014:01:28:55 -0800] "GET /about/ HTTP/1.1" 200 29933 "-" "Java/1.6.0_04" 204 30330
78.129.252.190 - - [24/Jan/2014:01:28:56 -0800] "GET /articles-columns HTTP/1.1" 301 - "-" "Java/1.6.0_04" 214 363
78.129.252.190 - - [24/Jan/2014:01:28:57 -0800] "GET /articles-columns/ HTTP/1.1" 200 29973 "-" "Java/1.6.0_04" 215 30370
78.129.252.190 - - [24/Jan/2014:01:28:58 -0800] "GET /contact HTTP/1.1" 301 - "-" "Java/1.6.0_04" 205 354
78.129.252.190 - - [24/Jan/2014:01:28:58 -0800] "GET /contact/ HTTP/1.1" 200 47424 "-" "Java/1.6.0_04" 206 47827
What are they looking for? A vulnerability?
Can I block these visits by their Java referrer? If so, how? Or with a php function?
I know how to block IPs in .htaccess, but blocking by User-agent is a more proactive method for me).
Update 2/04/14 I'm not able to block a Java User-agent with either of these two rules.
RewriteCond %{HTTP_USER_AGENT} Java/1.6.0_04
RewriteRule ^.*$ - [F]
RewriteCond %{HTTP_USER_AGENT} ^Java
RewriteRule ^.*$ - [F]
Note: I'm on shared hosting and have limited access to apache configs.