2

I'm using the HTML5 Boilerplate build script on a new project that I've just deployed to a staging environment. The script works like a charm; it's well documented, so it was easy to configure for use in my application.

After reading through the documentation I decided to use Paul Irish's approach for VCS-based deployment to point to the /publish directory, using this snippet from his documentation in my .htaccess file:

RewriteEngine On
RewriteCond $1 !^publish/
RewriteRule ^(.*)$ publish/$1 [L]

I have it configured like this for my particular setup, and everything points to the minified and concatenated files just like it should. This is great, but the /publish directory is also browsable directly by going to http://[mysite.com]/publish/

This seems like kind of a loose thread to leave dangling. I'm wondering if anyone here has run into this and come up with a good solution. I'm not expecting users to type in /publish/ after the URL, but I wouldn't want it to be crawlable for sure, and it just seems a little sloppy to leave it like that.

Any ideas?

Thanks in advance

Update: after much appreciated help from Gerben, below, I ended up changing my thinking on this a bit - there is no need to redirect users from /publish to the root URL because users won't be typing in /publish, and there will never be any links to [site.com]/publish. Instead I've added the following rule in the .htaccess file within the /publish directory. This produces a 403 error (Forbidden) for any requests to the publish subdirectory: http://httpd.apache.org/docs/current/rewrite/flags.html#flag_f

RewriteCond %{THE_REQUEST} publish
RewriteRule .? - [F]

In addition, I've added the publish directory to robots.txt just to be sure search bots aren't indexing two sets of files which contain the same data.

pthompson
  • 51
  • 3
  • You should upload the /publish folder to the web-root. – Gerben May 21 '12 at 13:01
  • That would be the simple solution. I left out a key point in my question which precludes me from setting things up the way you propose... I maintain all of my web applications as git repositories to keep everything in sync and under version control. I considered using git sparse-checkout but really, I'd prefer to keep my local, hosted, staging, and prod repos completely synced up with each other so as to avoid confusion. – pthompson May 21 '12 at 16:31

2 Answers2

0

Seems I misread you question. I think the following would redirect anything back the the root folder.

RewriteCond %{THE_REQUEST} " /publish/"
RewriteRule ^publish/(.*) /$1 [R=302,L]

To be sure I would probably also add /public to my robots.txt as forbidden, just in case you accidentally remove the htaccess or something.

Explanation: The RewriteRule check that the requested url starts with publish/___ and redirect those urls to /___. But to distinguish between direct requests to /publish and urls rewritten to /publish you'll need to examine the originally requested url. The only way to get to that is via the THE_REQUEST variable. That variable should contain something like GET /publish/___ HTTP/1.1 for direct requests. So the RewriteCond checks for the presence of <space>/publish/

EDIT: final attempt:

RewriteBase /
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^publish/(.*) /$1 [R=302,L]
Gerben
  • 16,747
  • 6
  • 37
  • 56
  • Thanks for the response, Gerben. I know very little about apache config syntax... Looking at the apache mod_rewrite documentation, it seems like this 302 rewrite will do exactly what I want it to. I put it after the snippet in my question and unfortunately it doesn't seem to have any effect. When I browse to site.com/publish, there's no redirect happening. -- also, I'm a little bit confused about what the 302 hijack redirect is doing... reading up on that now. – pthompson May 21 '12 at 18:53
  • Seems like `\b` didn't work. I changed the code above to use a normal space. I checked the http spec to confirm that only a space is allowed there, so \b was kind of overkill. – Gerben May 21 '12 at 19:38
  • Hey Gerben. Thanks for the explanation -- very helpful. I tried this in a few different configurations and it's still yielding the same results... user can still visit /publish/ without any redirect occurring. I've been in contact with the html5bp team to see if they've run into this and if they have any solutions. – pthompson May 21 '12 at 23:58
  • That's rather strange. I had it working on my test-server. If check again when I'm back from work. – Gerben May 22 '12 at 07:39
  • Works just fine on my test server. What happens if you remove the rewritecond; do you get the "The page isn't redirecting properly" message? What happens if you use just, `RewriteCond %{THE_REQUEST} /publish/` or even `RewriteCond %{THE_REQUEST} publish`? – Gerben May 22 '12 at 18:15
  • Unfortunately I still haven't gotten this to work. If I remove the rewrite condition (leaving in the rewrite condition and rule that I put in my original question), nothing happens any differently. If I change the rewrite rule to "publish", " publish" or "/publish" or any combination therein, it doesn't seem to have any effect. This is true on my local dev environment, and on two different remote apache servers. I'm continuing to ask around about this. If I find any more info or have any updates I'll be adding them to this thread. – pthompson May 24 '12 at 17:53
  • Try putting it in front of the other code, or behind it. I'm kind of lost here. – Gerben May 24 '12 at 21:34
  • Yeah me too! I've tried changing the order and also tried using a rewrite rule in an htaccess file within the subdirectory. I've ended up with either no effect at all, or a redirect loop. – pthompson May 25 '12 at 00:51
  • Thanks so much - I had tried adding `RewriteBase /` and it didn't yield anything different. I can't seem to get this to work without generating a redirect loop. What I've done instead is added this rule in /publish/.htaccess: `RewriteRule .? - [F]`. Now if a user browses to /publish, they get a 403, which is fine by me. [site.com/] is still serving up the files from /publish as I intended. This is a reasonable solution, I think. I'm not sure if or how all of this will effect indexing for search bots though. – pthompson May 25 '12 at 19:46
  • Not sure how that code would work, but as long as it is working... Indexing should be fine. Just add /publish to your robots.txt just to be sure. Good luck. – Gerben May 25 '12 at 19:50
0

So the solution I ended up with was to throw users a 403 (forbidden) error on the odd chance that they happen to stumble on [site.com]/publish. Here's how I did it:

In the root directory's .htaccess file, I kept this rule from the h5bp documentation:

RewriteCond $1 !^publish/
RewriteRule ^(.*)$ publish/$1 [L]

In the /publish directory's .htaccess file I added this rule, with the F flag (forbidden):

RewriteCond %{THE_REQUEST} publish
RewriteRule .? - [F]

I hope this is helpful for anyone else who runs into this problem!

pthompson
  • 51
  • 3