0

I have a DNN site with over 20,000 pages. The Googlebot and Bingbot are consistently crawling my website.

When I look at my sitelog I can see that google and bing are crawling my site via the pageid (ex: www.url.com/Default.aspx?TabID=5000)

The bots are hitting my website every minute. When I add new page, I am expecting the bots to crawl the new added page, instead I see the bots re-crawling very old pages and will take a couple of hours before it recognizes the newly added page.

I have robot.txt file with over 10,000 entries that have the following defenitions:

Disallow:/Default.aspx?TabID=5000
Disallow:/Default.aspx?TabID=5001
Disallow:/Default.aspx?TabID=5002

and so forth.

So I am noticing a couple of issues:

1 - Googlebot and Bingbot are ignoring my disallows and are recrawling pages that I have defined in the robots.txt - how does the bot know to go back and recrawl old pages, using the TabID?

2 - I still notice that when I add a new page, both bots are busy crawling old content, and do not immediately read my new content, is there a way to force Google and Bing bots to always read newly added pages first?

thank you in advance for any suggestions.

Cesar
  • 139
  • 2
  • 15
  • 1
    What version of dotnetnuke are you on? Are you using any sitemap providers? Have you checked webmaster tools to Dr if the engines see your robots.txt file – Chris Hammond Jun 14 '13 at 09:48
  • - using version 5. - Not using any sitemap providers. - I checked in webmastertools and it is reading the robots.txt file, the problem is that it only seems to allow me only around 100 disallow lines.
    So I really do not know how else to tell the bots to not check old pages. I wanto to block anything below page 20,000 www.url.com/Default.aspx?TabID=20000, and I know I can not add 20k rows to my robots.txt.
    Any suggestions?
    – Cesar Jun 14 '13 at 13:46
  • Do you want to block *all* page ID URLs? – unor Jun 14 '13 at 23:14
  • No I do not want to block all page ID Urls. When I publish new pages I am noticing that the bots are busy scanning old pages. I would like the robots to turn their attention to the new pages. Not sure if this is even possible. – Cesar Jun 14 '13 at 23:49

1 Answers1

0

If you go to http://URL.com/sitemap.aspx check to see what pages are listed there.

I would highly recommend upgrading to DNN 7 as you can control which pages show up in the sitemap, that may help you control your indexing issues.

UPDATE: Under the Admin Menu, if you find a search engine sitemap page, you can set a minimum page priority to be included in the sitemap. Then for the pages you don't want to show up you can modify their priority in the page settings.

Chris Hammond
  • 8,873
  • 1
  • 26
  • 34
  • Chris, I am unable to upgrade to a newer version because of all the custom code that this application has. I looked at the sitemap.aspx and see thousands of pages. Is there a way for me to choose which pages get shown on the sitemap in this version? – Cesar Jun 15 '13 at 05:09
  • Updated the answer with info – Chris Hammond Jun 15 '13 at 22:42
  • Chris, thanks for your answer, in the Search Engine Site Map page there is a "Exclude urls with a priority lower than" option with a default value of 0.1. I updated all the pages I do not want on the sitemap with priority 0. I will confirm this is working in the next couple of days. Thanks! – Cesar Jun 16 '13 at 07:34
  • So after a couple of days of testing - I am noticing that the sitemap.aspx is still loading over 20k link to internal pages. Any other ideas whee I could limit the size of this page? – Cesar Jun 20 '13 at 00:22