0

I want to index binary files (PDF, WORD, TEXT) into elasticsearch, I have used fscrawler for that and I'm getting the following error while running the fscrawler.

I have followed this link : https://fscrawler.readthedocs.io/en/latest/user/getting_started.html

Config File - YAML

---
name: "hello"
fs:
  url: "/home/gowtham/Documents"
  update_rate: "15m"
  excludes:
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
elasticsearch:
  nodes:
  - url: "http://10.0.2.2:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
  index : "hello"

This location /home/gowtham/Documents has a pdf file

I got the following error


12:46:22,477 WARN  [f.p.e.c.f.c.v.ElasticsearchClientV6] failed to create index [hello], disabling crawler...
12:46:22,478 FATAL [f.p.e.c.f.c.FsCrawlerCli] Fatal error received while running the crawler: [Elasticsearch exception [type=illegal_argument_exception, reason=request [/hello] contains unrecognized parameter: [include_type_name]]]
12:46:22,478 DEBUG [f.p.e.c.f.c.FsCrawlerCli] error caught
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=illegal_argument_exception, reason=request [/hello] contains unrecognized parameter: [include_type_name]]
    at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) ~[elasticsearch-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2053) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2030) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1777) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1734) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1696) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.IndicesClient.create(IndicesClient.java:191) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:240) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:603) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndices(ElasticsearchClientV6.java:436) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.start(FsCrawlerImpl.java:161) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:270) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
    Suppressed: org.elasticsearch.client.ResponseException: method [PUT], host [http://10.0.2.2:9200], URI [/hello?master_timeout=30s&include_type_name=true&timeout=30s], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"}],"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"},"status":400}
        at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:936) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:233) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1764) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1734) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1696) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.IndicesClient.create(IndicesClient.java:191) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
        at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:240) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:603) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndices(ElasticsearchClientV6.java:436) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.start(FsCrawlerImpl.java:161) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:270) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
    Caused by: org.elasticsearch.client.ResponseException: method [PUT], host [http://10.0.2.2:9200], URI [/hello?master_timeout=30s&include_type_name=true&timeout=30s], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"}],"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"},"status":400}
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:552) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:537) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
        at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119) ~[httpcore-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177) ~[httpasyncclient-4.1.2.jar:4.1.2]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.2.jar:4.1.2]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.2.jar:4.1.2]
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_201]
12:46:22,484 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [hello]
12:46:22,485 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
12:46:22,486 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
12:46:22,487 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [hello] stopped

Kindly help me to solve this issue.

Thanks in advance.

dadoonet
  • 14,109
  • 3
  • 42
  • 49
Gowtham Raj
  • 103
  • 2
  • 13

1 Answers1

1

I have used Elasticsearch version 6.4 instead have to use 6.7 to solve this issue.

Credits to @dadoonet. https://github.com/dadoonet/fscrawler/issues/713

dadoonet
  • 14,109
  • 3
  • 42
  • 49
Gowtham Raj
  • 103
  • 2
  • 13