2

On the test tool for mobile friendly websites from google (https://www.google.com/webmasters/tools/mobile-friendly/) , it says that my site is not optimized for mobile devices, but it is. And the reason is because the Robots.txt is blocking a lot of resources. My website is based on joomla 1.5, but it has a responsive template.

this is my robots.txt file, but it seems that the js, css and images are still blocked.

User-agent: *

Allow: /templates/
Allow: /*.js
Allow: /*.css
Allow: /*.jpg
Allow: /*.gif
Allow: /*.png

Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/
Disallow: /xmlrpc/
Disallow: /AnexosEmpresas/
Disallow: /Formulario/
Disallow: /estadisticas/
Disallow: /installation-xx/
Disallow: /site2/
Disallow: /TemplateMail/
Disallow: /IMPLEMENTACION/
Disallow: /clicks/
Disallow: /LiveZilla/
Disallow: /*format=feed*
Disallow: /*view=category*
Disallow: /*index.php/*
Disallow: /*option=com_sobi2*
Disallow: /*content/category/*
Disallow: /*start=/*
Disallow: /presentacion_ant/
Disallow: /presentacion/
Disallow: /CronJobs/
Disallow: /plantillas/

any idea on how to unblock the required resources??

user3407397
  • 59
  • 1
  • 8
  • The obvious solution is to simply remove the Disallow lines that correspond to the directories you want to unblock. If there is something preventing you from doing that, please clarify the question to include this information. – plasticinsect May 20 '15 at 20:44
  • the problem of removing the disallow lines is because, there are js, css and images almost in every folder, so I would have to remove almost all "disallows" which is not a good idea. I just want that google be able to crawl th js, css and images inside all folders. – user3407397 May 20 '15 at 23:51

1 Answers1

7

This is happening because Google prioritizes competing Allows and Disallows based on the length of the path. The directive with the longer path wins. If they are the same length, Allow wins over Disallow. This rule is specific to Google. Not all crawlers do it this way.

For example, in the following:

User-agent: *
Allow: /a
Disallow: /aa

/aardvark would be blocked (for Google), because "/aa" is longer than "/a", so the Disallow has precedence over the Allow.

In:

User-agent: *
Allow: /aa
Disallow: /a

/aardvark would not be blocked, because the Allow has the longer path.

For purposes of this rule, a wildcard is counted as just one more character. For example, in this:

User-agent: *
Allow: /a*
Disallow: /aa

/aardvark would not be blocked, because "/a*" is the same length as "/aa" (even though "/a*" is functionally identical to "/a", which is shorter).

How to fix it?

Option 1:

The simplest way would be to simply remove some of the Disallows and accept that Google will crawl some files that you don't want them to. This is probably what I would do. This is obviously a compromise, but it's the only option that will actually make your robots.txt file easier to read.

Option 2:

Explicitly allow each file type for each directory that may contain files of that type. For example, this line:

Disallow: /plugins/

would become this:

Allow: /images/*.jpg
Allow: /plugins/*.js
Allow: /plugins/*.css
Allow: /plugins/*.gif
Allow: /plugins/*.png
Disallow: /plugins/

The above example will block any file in /plugins/, except when the URL includes one of ".jpg", ".js", ".css", etc.

It will block:

http://example.com/plugins/
http://example.com/plugins/somefile.php
http://example.com/plugins/some/path/somefile.php

It will not block:

http://example.com/plugins/somefile.js
http://example.com/plugins/somefile.jpg
http://example.com/plugins/somefile.css
http://example.com/plugins/whatever.php?file=foo.css

You will have to do this separately for each directory you are blocking.

Option 3:

Warning: The following is a hack. I have verified that this works, but it relies on undocumented behavior that Google may change in the future. It will almost certainly not work on crawlers other than Google.

You can pad the Allows with multiple trailing wildcards to make them longer than the longest Disallow:

Allow: /*.js***************
Allow: /*.css**************
Allow: /*.jpg**************
Allow: /*.gif**************
Allow: /*.png**************

# Your existing disallows go here.

These will override any Disallow whose path has 20 characters or less. The trailing wildcards have no effect on what will be matched. They only increase priority.

plasticinsect
  • 1,702
  • 1
  • 13
  • 23
  • For example, how could I allow this CSS resource to the crawler ... http://www.example.com/plugins/system/cdscriptegrator/libraries/highslide/css/cssloader.php?files%5B%5D=highslide.css?, because the test tool to check if my website is mobile friendly says it is not because some resources like this one are blocked. – user3407397 May 21 '15 at 09:18
  • Both option 2 and option 3 will do exactly that. I have edited my examples to hopefully make that a little clearer. – plasticinsect May 21 '15 at 17:29