4

I am using nutch2.2.1. Log file is generating following error

ERROR protocol.RobotRulesParser - Agent we advertise (nutch-spider-2.2.1) not listed first in 'http.robots.agents' property!

My nutch-site.xml is (for above property)

<property>
<name>http.agent.name</name>
<value>nutch-spider-2.2.1</value>
</property>

my nutch-default.xml is

<property>
<name>http.agent.name</name>
<value></value>
</property>

Where is actual problem? Please guide it clearly(properly explaination). This question is posted here but I have to bounty this question (if needed) that's why posting it again.

Community
  • 1
  • 1
Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121

1 Answers1

4

You shoule add the property of "http.robots.agents" and put the value of http.agent.name as the first agent name, and keep the default * at the end of the list.just like:

<property>
     <name>http.robots.agents</name>
     <value>nutch-spider-2.2.1,*</value>
</property>
hqc
  • 56
  • 3