1

I have been working on crawling webpages and extracting the elements of the website. Ex: Given a website - The crawler should return the following sections: Header, Menu, Footer, content etc.

I was thinking that it would be great if I could use machine learning to train the code to learn how to classify websites.

I tried looking at Python Machine learning libraries (ex: PyBrain) but the examples are very complex. Can anyone please suggest me a library and some tutorial on how to get started on using Python Machine Learning with some simple examples?

Thanks!

Jamal
  • 287
  • 1
  • 4
  • 15

1 Answers1

2

MLPy may be a simpler start for you. Here is a link to the documentation on classification. By the way, if you don't know what the classes should look like, maybe you need to cluster your pages, and not to classify them.

cyborg
  • 9,989
  • 4
  • 38
  • 56