0

This morning, a lot of my website where tagged "this site may be compromised" by Google in it's result. Sites that are under my supervision on my own VPS server. I'ved run a deep scan on it and nothing unsual. I'ved look for suspicious htaccess and for javascript injection and nothing wrong so far.

Yesterday, I put an htaccess file to my web root to insure no sql, javascript, base64 and any other suspicious hacking solution might attack my server.

So I do suspect that Google add "this site may be compromised" since I add this protection to all my web sites.

there is the content of this htaccess :

RewriteEngine On
RewriteCond %{REQUEST_URI} !^/robots.txt
RewriteCond %{REQUEST_URI} !^/sitemap.xml

RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR] 
RewriteCond %{HTTP_USER_AGENT} ^[bcdfghjklmnpqrstvwxz\ ]{8,}|^[0-9a-z]{15,}|^[0-9A-Za-z]{19,}|^[A-Za-z]{3,}\ [a-z]{4,}\ [a-z]{4,} [OR]
RewriteCond %{HTTP_USER_AGENT} ^<sc|<\?|^adwords|@nonymouse|Advanced\ Email\ Extractor|almaden|anonymous|Art-Online|autoemailspider|blogsearchbot-martin|CherryPicker|compatible\ \;|Crescent\ Internet\ ToolPack|Digger|DirectUpdate|Download\ Accelerator|^eCatch|echo\ extense|EmailCollector|EmailWolf|Extractor|flashget|frontpage|Go!Zilla|grub\ crawler|HTTPConnect|httplib|HttpProxy|HTTP\ agent|HTTrack|^ia_archive|IDBot|id-search|Indy\ Library|^Internet\ Explorer|^IPiumBot|Jakarta\ Commons|^Kapere|Microsoft\ Data|Microsoft\ URL|^minibot\(NaverRobot\)|^Moozilla|^Mozilla$|^MSIE|MJ12bot|Movable\ Type|NICErsPRO|^NPBot|Nutch|Nutscrape/|^Offline\ Explorer|^Offline\ Navigator|OmniExplorer|^Program\ Shareware|psycheclone|PussyCat|PycURL|python|QuepasaCreep|SiteMapper|Star\ Downloader|sucker|SurveyBot|Teleport\ Pro|Telesoft|TrackBack|Turing|TurnitinBot|^user|^User-Agent:\ |^User\ Agent:\ |vobsub|webbandit|WebCapture|webcollage|WebCopier|WebDAV|WebEmailExtractor|WebReaper|WEBsaver|WebStripper|WebZIP|widows|Wysigot|Zeus|Zeus.*Webster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^curl|^Fetch\ API\ Request|GT\:\:WWW|^HTTP\:\:Lite|httplib|^Java/1.|^Java\ 1.|^LWP|libWeb|libwww|^PEAR|PECL\:\:HTTP|PHPCrawl|python|Rsync|Snoopy|^URI\:\:Fetch|WebDAV|^Wget [NC]
RewriteRule (.*) - [F]

RewriteCond %{REQUEST_METHOD} (GET|POST) [NC]
RewriteCond %{QUERY_STRING} ^(.*)(%3C|<)/?script(.*)$ [NC,OR]
RewriteCond %{QUERY_STRING} ^(.*)(%3D|=)?javascript(%3A|:)(.*)$ [NC,OR]
RewriteCond %{QUERY_STRING} ^(.*)document\.location\.href(.*)$ [OR]
RewriteCond %{QUERY_STRING} ^(.*)(%3D|=)http(%3A|:)(/|%2F){2}(.*)$ [NC,OR]
RewriteCond %{QUERY_STRING} ^(.*)base64_encode(.*)$ [OR]
RewriteCond %{QUERY_STRING} ^(.*)GLOBALS(=|[|%[0-9A-Z]{0,2})(.*)$ [OR]
RewriteCond %{QUERY_STRING} ^(.*)_REQUEST(=|[|%[0-9A-Z]{0,2})(.*)$ [OR]
RewriteCond %{QUERY_STRING} ^(.*)(SELECT(%20|\+)|UNION(%20|\+)ALL|INSERT(%20|\+)|DELETE(%20|\+)|CHAR\(|UPDATE(%20|\+)|REPLACE(%20|\+)|LIMIT(%20|\+))(.*)$ [NC]
RewriteRule (.*) - [F]

There is a lot of keyword within this file regarding hacking terminology ... is there any way that Google might look into the htaccess file ?

Should I block google with a robots.txt for this htaccess only or could/should I add a line of code directly into the htaccess to block Google for scanning this file... ?

What do you think ?

AstroCB
  • 12,337
  • 20
  • 57
  • 73
Jaune Citron
  • 319
  • 3
  • 13

1 Answers1

1

If .htaccess is visible from outside, then you have a serious problem. That file should never be visible by anybody accessing the site through http. Blocking it in robots.txt would just prevent well-behaved bots from looking at it. But bots that ignore robots.txt would still have access.

If you suspect that your .htaccess is the cause of the problem, you need to make sure that it can't be served. That's the default on Apache, but if you were mucking around with permissions I suppose you could have exposed it. If you did, you need to fix that.

I think you need to look somewhere else for the cause of Google's "this site may be compromised" message. A Google (or Bing) search on [this site may be compromised] reveals lots of information about why that warning might appear.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • I just tought it could be a fancy way for google to access it. Just found the real crap. Some pharmacy hack injected by base64 2 weeks ago. The htaccess that is now in place since this morning prevent this crap ! Thanks for your quick answer @Jim Mischel. I will now clean this mess and resend Google to scan the site and to clean this funny Tag of compromised site. – Jaune Citron Mar 08 '13 at 02:51