1

I need to rewrite a bunch of urls (about 100 or so) for SEO purposes, and there may be more being added in the future (probably another 50-100 later on). I need a flexible way of doing this and so far, the only way I can think of is to edit the .htaccess file using the rewrite engine.

For example, I have a bunch of urls like this (please note that the query string is irrelevant, and dynamic; it could be anything. I was only using them purely as an example. I am only focusing on the pathname--the part between the hostname and query string, as marked in bold below):

http://example.com/seo_term1?utm_source=google&utm_medium=cpc&utm_campaign=seo_term http://example.com/another_seo_term2?utm_source=facebook&utm_medium=cpc&utm_campaign=seo_term

http://example.com/yet_another_seo_term3?utm_source=example_ad_network&utm_medium=cpc&utm_campaign=seo_term http://example.com/foobar_seo_term4

http://example.com/blah_seo_term5?test=1

etc...

And they are all being rewritten to (for now): http://example.com/

What's the most efficient/effective way of doing this so that I may be able to add more terms in the future?

One solution I came across is to do this (in the .htaccess file):

RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ / [NC,QSA]

However, the problem with this solution is that even invalid urls (such as http://example.com/blah) will be rewritten to http://example.com instead of giving a 404 code (which is what it is supposed to do anyway). I'm still trying to figure out how all this works, and the only way I can think of is to write 100 more RewriteCond statements (such as: RewriteCond %{REQUEST_URI} =/seo_term1 [NC,OR]) before the RewriteRule directive. For example:

RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} =/seo_term1 [NC,OR]
RewriteCond %{REQUEST_URI} =/another_seo_term2 [NC,OR]
RewriteCond %{REQUEST_URI} =/yet_another_seo_term3 [NC,OR]
RewriteCond %{REQUEST_URI} =/foobar_seo_term4 [NC,OR]
RewriteCond %{REQUEST_URI} =/blah_seo_term5 [NC]
RewriteRule ^(.*)$ / [NC,QSA]

But that doesn't sound very efficient to me. Is there a better way?

James Nine
  • 173
  • 1
  • 7

2 Answers2

3

The first improvement you can make is that you don't need the RewriteCond lines at all.

RewriteRule /seo_term1 / [NC,QSA]

Does exactly what your two lines are doing now.

The second improvement you could make is using a RewriteMap. The rewrite map itself is can be updated without restarting Apache.

RewriteMap seo txt:/etc/apache2/maps/seo.txt
RewriteRule (.*) ${seo:$1} [NC,QSA]

and seo.txt contains

/seo_term1 /
/seo_term2 /

Note: I haven't actually used a RewriteMap for several years. The above config may need some tweaking due to my imperfect memory.

Ladadadada
  • 26,337
  • 7
  • 59
  • 90
  • Thanks, this is sort of what I was looking for... unfortunately it didn't work. I keep getting 400 errors for anything in the map text file. I tried to look up some more examples for RewriteMap but there's nothing that seems to explain this in further detail. Thanks for trying at least. – James Nine Jun 24 '12 at 22:43
1

A regex should be pretty capable of pulling this off.

RewriteEngine on
RewriteCond %{REQUEST_URI} ^\/[^\?]+\?(?=.*(utm_source\=(google|msn|yahoo)))(?=.*(utm_medium\=(cpc|ppc)))(?=.*(utm_campaign\=[a-zA-Z0-9._-]+))
RewriteRule ^(.*)$ / [L,R=301]

The above would match only a string that contains all the parameters specified, regardless of the leading (pre-?) string.

Edit ...

Okay, you've changed your question quite a bit now. But thankfully, its even more straightforward.

RewriteEngine on
RewriteCond %{REQUEST_URI} ^\/(seo_term1|seo_term2)(.*)?
RewriteRule ^(.*)$ / [L,R=301]

Just change/edit/add values as necessary.

Ben Lessani
  • 5,244
  • 17
  • 37
  • utm_source won't always be "google" and utm_medium won't always be "cpc", those were just examples. Also, what if someone puts an invalid value anyway (due to malicious reasons or what not), e.g. `http://www.example.com/blahblah?utm_source=facebook&utm_medium...` I would want this to 404. – James Nine Jun 24 '12 at 18:05
  • Well list what all the possible `utm_source`s are and what all the `utm_medium`s are then! – Ben Lessani Jun 24 '12 at 18:09
  • I think I will re-word my question then; I really only care about the pathname, not the query string. The query string is totally dynamic and could even end with something like ?test=1 for example. – James Nine Jun 24 '12 at 18:13