0

I downloaded source code from here. I tried to run the example from chapter 4 of the book 'Programming Collective Intelligence' by Toby Segaran. My python version is 2.7.2. I type in interpreter this code:

import searchengine
pages=['http://en.wikipedia.org/wiki/Programming_language']
crawler = searchengine.crawler('searchindex.db')
crawler.crawl(pages)

And get message:

Could not open http://en.wikipedia.org/wiki/Programming_language

Or sometimes get message:

Indexing http://en.wikipedia.org/wiki/Programming_language
Could not parse page http://en.wikipedia.org/wiki/Programming_language

In summary crawler doesn't index the page. What am I doing wrong?

stoneyang
  • 179
  • 1
  • 2
  • 12
Helio Gracie
  • 143
  • 1
  • 2
  • 10

1 Answers1

1

Turn def separateWords(self,text) the uppercase W into lowercase, and in gettextonly(self,soup), turn v==Null into None. Also you have to execute the later steps like

>> crawler=searchengine.crawler('searchindex.db') 
>> crawler.createindextables()
>> crawler=searchengine.crawler('searchindex.db') 

first, then try to run page=['***'] and other steps.

Ben
  • 51,770
  • 36
  • 127
  • 149
Swan
  • 11
  • 1