0

I am attempting to create some 301 redirects using .htaccess in order to fix some Crawler Errors that Google is encountering. Google is using old versions of some of my URLs, which is causing errors to arise when the spider attempts to access the legacy paths. These legacy URLs contain spaces, and there were many of them, all appearing at different places within the URL string. For instance, a URL might have looked like this:

http://www.example.com/car-kits-halogen-aircon-oil/773 CAR 773-halogen-aircon-oil

These URLs are needlessly long, especially since they ultimately all displayed identical content. They have now been shortened to something like this:

http://www.example.com/773-halogen-aircon-oil

There are literally over 1,000 of these, although there are only 6 permutations of the end of the URL, so I thought I could use either RedirectMatch or RewriteRule to easily match whether one of those 6 permutations existed and redirect to the appropriate new URL. I have tried the following:

RedirectMatch .*/773[\s]?(%20)?CAR[\s]?(%20)?773-halogen-aircon-oil$ http://www.example.com/773-halogen-aircon-oil

as well as:

RewriteEngine On
RewriteBase /
RewriteRule .*/773[\s]?(%20)?CAR[\s]?(%20)?773-halogen-aircon-oil$ http://www.example.com/773-halogen-aircon-oil [R=301,L]

These are not working. I have tried many different options, including escaping the spaces with a \ instead of the regex character class, changing the beginning of the regex to ^.*/?773 and many other things, but nothing works.

I have used the regex test feature at http://www.regular-expressions.info/javascriptexample.html and it reports that my regular expression is valid and generates a match against the URLs I expect it to, but .htaccess is not redirecting as I expect.

I can use a plain Redirect as follows:

Redirect 301 "/car-kits-halogen-aircon-oil/773 CAR 773-halogen-aircon-oil" http://www.example.com/773-halogen-aircon-oil

This works, but it is problematic, because it matches only when car-kits-halogen-aircon-oil is present, and there are probably 200 permutations of that URI segment, which would make manually entering every possible permutation a huge undertaking.

Any suggestions? Is what I am trying to do even possible?

  • It looks like you're trying to accomplish something like this: `RedirectRule ^/.*-halogen-aircon-oil/([0-9]+).*$ http://www.example.com/$1-halogen-aircon-oil [R=301,L]` Is that the general format? – Kevin Stricker Jan 27 '11 at 03:39
  • The problem is that the first segment had about 200 permutations, most of which may have had nothing to do with the second segment. So matching against the first segment won't work, since it could be - for all intents and purposes - anything. – Lance Johnson Jan 27 '11 at 07:13
  • So, for example, the old URL could have been "example.com/xxx-yyy-zzz/773 CAR 773-halogen-aircon-oil" or "example.com/aaa-bbb-ccc/773 CAR 773-halogen-aircon-oil" and both should now become "example.com/773-halogen-aircon-oil" – Lance Johnson Jan 27 '11 at 07:17

1 Answers1

0

The rule you are trying seems to be a bit complicated and it would be better to have more examples.

RewriteEngine on
RewriteRule ^.*/773%20CAR%20(.*)$ http://www.example.com/$1 [R=301,L]

Should do the job.

karlcow
  • 6,977
  • 4
  • 38
  • 72
  • This looks like something that should work, but it doesn't. All of the URLs on the site now take the form of http://www.example.com/segment_X whereas they used to take the form http://www.example.com/segment_Y/segment_Z. Segment_Y has about 200 or so permutations, and segment_Z has only 6 permutations. Segment_Z always was always: – Lance Johnson Jan 27 '11 at 07:01
  • something like "773 CAR 773-halogen-aircon-oil" or "G2 KPD g2-gen-note-list". In the new system, the segment_X will always be one of the 6 permutations of segment_Z, but it needs to lose the first two elements and whitespace ("773 CAR " or "G2 KPD ", etc.). Make sense? – Lance Johnson Jan 27 '11 at 07:10
  • Is there always the sequence of characters "letters+numbers 3letters"? Could you give a sample of more URIs and the final form you want? – karlcow Jan 27 '11 at 11:41
  • Here are the 6 segment_Z permutations: "773 CAR 773-halogen-aircon-oil", "G2 KPD g2-gen-note-list", "892 DIK 892-kitted-lights-conversions-glass", "R6m KPD r6m-gen-note-list", "G3 LKV g3-gen-note-list", "B4 HID b4-kitted-lights-conversions-glass". I'm happy to have 6 match rules if needed, but the main issue is that I need segment_Y to disappear no matter what, and segment_Z to remain intact, simply without the ("letters+numers 3letters ") pattern. I hope this is helpful. In the meantime, I wrote a PHP script to write separate Redirect rules for each crawl error, but more crop up every day. – Lance Johnson Jan 27 '11 at 18:37
  • So example.com/aaa-bbb-ccc/R6m KPD r6m-gen-note-list and example.com/ddd-eee/R6m KPD r6m-gen-note-list should both redirect to example.com/r6m-gen-note-list. Likewise example.com/zzz-yyy-xxxx-www/G3 LKV g3-gen-note-list and example.com/vvv-uuu-ttt/G3 LKV g3-gen-note-list should both resolve to example.com/g3-gen-note-list and so on. – Lance Johnson Jan 27 '11 at 18:39