0

Can somebody please tell me which protocol is used by nutch for fetching pages. I wanted to check what kind of request does nutch makes ?

I used charles proxy to see the request information but sadly nothing obtained there. Am i missing something about charles proxy or about nutch ??

I have also tried wireshark but there cam too many packets and I could not identify which one was of nutch ?

Please help..

Sachin Jain
  • 21,353
  • 33
  • 103
  • 168

1 Answers1

0

Nutch is a web crawler, so I guess it is using HTTP protocol. Most likely HTTP GET to fetch pages.

If you need more information (e. g. user agend of nutch) consider setting up a apache web server on your machine and crawl some test pages. Then have a look at the apache access log.

mana
  • 6,347
  • 6
  • 50
  • 70
  • I tried crawling some other website and then tried to find out the result in charles. Charles was showing all other requests but not a single request from nutch. i could not understand the reason. – Sachin Jain Jul 03 '12 at 17:00
  • Have you nutch set up to use a proxy? Have a look at the corresponding lines of the `nutch-site.xml`: [behind this link](http://wiki.apache.org/nutch/SetupProxyForNutch#Configure_Nutch_.28Nutch_1.3.29) – mana Jul 04 '12 at 07:39