This morning, a lot of my website where tagged "this site may be compromised" by Google in it's result. Sites that are under my supervision on my own VPS server. I'ved run a deep scan on it and nothing unsual. I'ved look for suspicious htaccess and for javascript injection and nothing wrong so far.
Yesterday, I put an htaccess file to my web root to insure no sql, javascript, base64 and any other suspicious hacking solution might attack my server.
So I do suspect that Google add "this site may be compromised" since I add this protection to all my web sites.
there is the content of this htaccess :
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/robots.txt
RewriteCond %{REQUEST_URI} !^/sitemap.xml
RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^[bcdfghjklmnpqrstvwxz\ ]{8,}|^[0-9a-z]{15,}|^[0-9A-Za-z]{19,}|^[A-Za-z]{3,}\ [a-z]{4,}\ [a-z]{4,} [OR]
RewriteCond %{HTTP_USER_AGENT} ^<sc|<\?|^adwords|@nonymouse|Advanced\ Email\ Extractor|almaden|anonymous|Art-Online|autoemailspider|blogsearchbot-martin|CherryPicker|compatible\ \;|Crescent\ Internet\ ToolPack|Digger|DirectUpdate|Download\ Accelerator|^eCatch|echo\ extense|EmailCollector|EmailWolf|Extractor|flashget|frontpage|Go!Zilla|grub\ crawler|HTTPConnect|httplib|HttpProxy|HTTP\ agent|HTTrack|^ia_archive|IDBot|id-search|Indy\ Library|^Internet\ Explorer|^IPiumBot|Jakarta\ Commons|^Kapere|Microsoft\ Data|Microsoft\ URL|^minibot\(NaverRobot\)|^Moozilla|^Mozilla$|^MSIE|MJ12bot|Movable\ Type|NICErsPRO|^NPBot|Nutch|Nutscrape/|^Offline\ Explorer|^Offline\ Navigator|OmniExplorer|^Program\ Shareware|psycheclone|PussyCat|PycURL|python|QuepasaCreep|SiteMapper|Star\ Downloader|sucker|SurveyBot|Teleport\ Pro|Telesoft|TrackBack|Turing|TurnitinBot|^user|^User-Agent:\ |^User\ Agent:\ |vobsub|webbandit|WebCapture|webcollage|WebCopier|WebDAV|WebEmailExtractor|WebReaper|WEBsaver|WebStripper|WebZIP|widows|Wysigot|Zeus|Zeus.*Webster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^curl|^Fetch\ API\ Request|GT\:\:WWW|^HTTP\:\:Lite|httplib|^Java/1.|^Java\ 1.|^LWP|libWeb|libwww|^PEAR|PECL\:\:HTTP|PHPCrawl|python|Rsync|Snoopy|^URI\:\:Fetch|WebDAV|^Wget [NC]
RewriteRule (.*) - [F]
RewriteCond %{REQUEST_METHOD} (GET|POST) [NC]
RewriteCond %{QUERY_STRING} ^(.*)(%3C|<)/?script(.*)$ [NC,OR]
RewriteCond %{QUERY_STRING} ^(.*)(%3D|=)?javascript(%3A|:)(.*)$ [NC,OR]
RewriteCond %{QUERY_STRING} ^(.*)document\.location\.href(.*)$ [OR]
RewriteCond %{QUERY_STRING} ^(.*)(%3D|=)http(%3A|:)(/|%2F){2}(.*)$ [NC,OR]
RewriteCond %{QUERY_STRING} ^(.*)base64_encode(.*)$ [OR]
RewriteCond %{QUERY_STRING} ^(.*)GLOBALS(=|[|%[0-9A-Z]{0,2})(.*)$ [OR]
RewriteCond %{QUERY_STRING} ^(.*)_REQUEST(=|[|%[0-9A-Z]{0,2})(.*)$ [OR]
RewriteCond %{QUERY_STRING} ^(.*)(SELECT(%20|\+)|UNION(%20|\+)ALL|INSERT(%20|\+)|DELETE(%20|\+)|CHAR\(|UPDATE(%20|\+)|REPLACE(%20|\+)|LIMIT(%20|\+))(.*)$ [NC]
RewriteRule (.*) - [F]
There is a lot of keyword within this file regarding hacking terminology ... is there any way that Google might look into the htaccess file ?
Should I block google with a robots.txt for this htaccess only or could/should I add a line of code directly into the htaccess to block Google for scanning this file... ?
What do you think ?