1

I am using owncloud and sometimes share links with people via Facebook.

I am concerned with automatic crawling, so I would like to deny facebook access to my owncloud.thomas-steinbrenner.net server (it visits all links in order to get preview pictures, preview texts, etc.)

Is there a way to do this in nginx? Like via hostname or agent? (I feel like using IPs is a game one cannot win).

If not: Is there some other way like a blacklist project with a list of gov-, FB-, etc. -IPs for iptables?

Tie-fighter
  • 751
  • 2
  • 9
  • 17
  • If I understand correctly you need only block the referer [try something like this](http://wiki.nginx.org/Referrer_Spam_Blocking) – user Jul 05 '13 at 11:57

2 Answers2

1

tcp wrappers? I believe that can do host/domain based denies. Also have you tried a simple robots.txt would be surprised if facebook didn't respect them. I'd think they could not afford the controversy of ignoring them.

Jason Tan
  • 2,752
  • 2
  • 17
  • 24
1

nginx supports the $http_user_agent value out of the box:

if ($http_user_agent ~* (facebook|google)) {
   ...
}

Hostname verification can be done via 3rd party module — ngx_http_rdns_module: http://wiki.nginx.org/HttpRdnsModule (https://github.com/flant/nginx-http-rdns)

Like this:

location / {
    rdns double;
    rdns_deny ^.*\.(facebook|google)\.com$;
}