1

I'm trying to test a site out using the IBM Watson Natural Language Understanding service. I'm doing so using this tool (https://natural-language-understanding-demo.mybluemix.net/) and entering a URL from our site to test.

Using our production servers (https://www.knox.edu), I get the following error for every page of the site.

{code: 400, error: "attempt to fetch failed: :closed"}

Using a test server of the same site (https://cmstest.knox.edu/test), it all works fine though.

What would be causing the errors from our production server?

Thanks!

2 Answers2

0

This error is typically caused by a site's robots.txt preventing the Watson NLU service from scraping the URL.

Check your robots.txt file to see if it's blocking user-agents (perhaps globally).

There's some additional info from a discussion of this error using the Python SDK here: https://github.com/watson-developer-cloud/python-sdk/issues/199

tmarkiewicz
  • 515
  • 5
  • 12
  • Thanks. I did see the robots.txt file could cause this, but I did specifically allow the watson crawlor. See the robots file here -> [link](https://www.knox.edu/robots.txt) – James Stevens Jun 27 '17 at 21:36
0

Looks like NLU has updated their crawling engine, the website you mentioned is crawlable from NLU now, when I ran categories call I am receiving the following output

{
    "categories": [{
    "score": 0.999469,
    "label": "/education/graduate school/college"},{
    "score": 0.497251,
    "label": "/law, govt and politics/legal issues/legislation/tax laws},{
    "score": 0.466882,
    "label": "/travel/tourist destinations/africa"}]
}
Hexfire
  • 5,945
  • 8
  • 32
  • 42