0

Scenario

You're a photographer who creates custom projects for clients, and publishes them on a space on your website ( http://we-photography.com/projects ).

All projects are different in theme and content, and as a result, they range from a rating of U (G) to 18 (R).

All projects are initailly hidden from visitors to the site - But are publically available to anyone who has the correct url to any given project. This allows clients to direct specific audiences to the pages that feature their photos.

The Problem

A few weeks ago I did a random search for an old username that I once used, and found it listed in the signature of a forum that I used to visit.

This made me think:

If I use NoFollow and NoIndex on projects that are rated as 18 content, I should in theory be protecting certain audiences from accessing that meterial.

However, if a client posts the URL to their work on a forum, social network or website, potentially, anyone who does a search for http://we-photography.com/projects will find this link.

So is there a solution to keep your url from being listed?

One obvious solution is to use a shortening site like bit.ly, to create a link for each client, but - that is not a guarantee of keeping the url safe, as other visitors could copy the full url and list it anywhere.

Alternatively, I could use multiple names for the projects folder : projects/, clients/ : so clean content is placed under one name, and adult under another. This may work, but only if they do a search for the url AND the subfolder.

To be clear:

1 . I want each project to be visible to the public. But not listed on my main website pages

2 . I do not want to register multiple sites to hold specifically rated content.

Any ideas on a solution?

W. Eless
  • 9
  • 4

2 Answers2

0

You phrase your question as more of a private home use problem and not so much an enterprise / professional problem but there you can have the same concern.

In general you use your website to publish information you want to disclose and be found.

  • Simply don't publish things you don't want to be public in a public space (on your website or online at all) .

  • Add access restrictions to your content to make it available only to authorized users (such as for instance username/password protection).

  • Publish internal information only on your intranet, the hostnames for which don't have to exist in your public DNS zone and then any URL's accidentally posted online will automatically fail for external users.

To follow up on your edit
You can't really avoid people posting links to your posts.

You can avoid indexing of (some) pages, media and other content and appearing in search results by a combination of:

  • publish different content in different places. Separate (sub) domains such as project.example.com or a www.example.net rather than using an URL path on your main site such as example.com/project might make it easier for both search engines, content filters and users to recognize, filter or exclude content of a specific project.
  • Most spiders and web crawlers will honor settings/restrictions place in a robots.txt file at the root of your (sub) domain, and will not include (within some limits) content you request to be excluded from search results. See https://support.google.com/webmasters/answer/6062608?hl=en
  • You can add access controls that (attempt to) recognize web crawlers and completely block their access and nothing should be indexed.
  • Add meta data to your html pages (such as <meta name="robots" content="noindex,nofollow"> )
HBruijn
  • 77,029
  • 24
  • 135
  • 201
  • Thanks H - It's not as simple as "don't publish" as all content is legitimately something of interest to my audience. I've edited my question for clarification. – W. Eless Sep 30 '19 at 09:25
0

I have a similar problem with the beta and lab sites of our company. Whether we want them to be accessed only by internal users or by selected external users, we don't want them to be indexed by some Google or other spiders: some of these pages are not totally secured and exposing them to search engines would give too much help to bad guys.

I have put in place a simple protection against spiders and random lurkers, but which is not by itself a total protection against deliberate intrusion attempts (protection that I implement with other tools)

This is simply to change either /etc/apache2/apache.conf or the needed /etc/apache2/sites-available/ files, so that they require an auth login / password. (Can also be done with .htaccess file in the root directory of the site, IF eanabled by the correct option for AllowOverride in the corresponding .conf file)

A typical implementation would be:

 <Directory /var/www/html/>
         ...

 #protection IP / password
 <RequireAny>
         <RequireAll>
                 AuthUserFile /var/secure/.htaxes
                 AuthName "Are you a subscriber?"
                 AuthType Basic
                 Require valid-user
         </RequireAll>
         Require ip IP1 IP2
         Require ip ::1 127.0.0.1 
 </RequireAny>
 ##protection

 ##      Require all granted

 </Directory>

where the optional line Require ip IP1 IP2 allows you to whitelist some IP addresses (eg, some internal users).

And where I have stored authorized logins and passwords in the file /var/secure/.htaxes by using htpasswd

Fibo
  • 21
  • 3
  • Thanks Fibo - But this is only really a solution if you know the IP addresses of people you want to allow or not allow. I've edited my question to make it a bit clearer. – W. Eless Sep 30 '19 at 10:44
  • There are 2 problems: 1 - Allowing legitimate users to access your site 2 - Avoiding search engines to index your pages. _I know_ no way to implement 2/ if your site is freely seeable by spiders. My suggestion makes things more difficult to satisfy the above 1/ , and IP whitelisting is impossible in that case. You might consider however to implement it, of course **after previously communicating** with your users. Note that you can have *a single combination* for login / password. Eg,putting for _Authname_ something like "Subscriber?" and granting access to those answering "yes" – Fibo Sep 30 '19 at 11:04
  • Note that as **HBruijn** mentionned _**Most** spiders and web crawlers will honor settings/restrictions_. This evidences that bad-guys spiders will probably **not** respect any restrictions placed in robots.txt Robots.txt are in fact not binding, they say _"there are pages here I would prefer not being indexed"_ , but are not an absolute restriction forbidding access to any file / directory. To some extent, _they are even an invitation for bad-guys!_ – Fibo Oct 07 '19 at 15:11