Questions tagged [fscrawler]

For everything related to FSCrawler project.

40 questions
0
votes
1 answer

Why is Fscrawler refusing to trust certificate while I have set ssl verification to 'false"?

Here's my Yaml file for fscrawler: name: "data_science" fs: url: "C:\\tmp\\DS_books" update_rate: "15m" excludes: - "*/~*" json_support: false filename_as_id: false add_filesize: true remove_deleted: true add_as_inner_object:…
ScottyCov
  • 21
  • 5
0
votes
0 answers

How do I get root CA chain certificate or Elasticsearch server certificate so that I can ingest documents from Fscrawler?

After I have set up Elasticsearch and pasted enrollment token into Kibana, is there a certificate that has been created that I can use to set up for HTTPS connection for Fscrawler or do I need to create a new one using the certutil…
ScottyCov
  • 21
  • 5
0
votes
1 answer

Accessing google cloud bucket via FS Crawler (elasticsearch)

The project I am currently working on needs a search engine to search a couple of 10.000 pdf files. When the user searches via the website for a certain keyword, the search engine will return a snippet of the pdf files matching his search criteria.…
frankmurphy
  • 194
  • 1
  • 3
  • 13
0
votes
1 answer

How to use fscrawler in ubuntu?

Is it possible to use fscrawler in ubuntu? I have used on windows and it works fine. When I try to follow the same implementation on ubuntu I am getting all kind of errors. First I just tried to pull the docker image and run it according to this…
0
votes
0 answers

Run several indexes as a service using fscrawler

I have successfully created an index job using fscrawler and made it run as a service in windows as shown in the documentation: set JAVA_HOME=c:\Program Files\Java\jdk15.0.1 set FS_JAVA_OPTS=-Xmx2g -Xms2g /Elastic/fscrawler/bin/fscrawler.bat…
Denn
  • 447
  • 1
  • 6
  • 27
0
votes
0 answers

Elasticsearch: Highlight in specific documents based upon file size criteria in index made via FsCrawler?

Currently I am using following search query to highlight stuff based on the query entered. Index is made via FsCrawler. GET index_name/_search { "query": { "query_string" :{ "query": "my_string_query_here" } }, …
josh
  • 11
  • 2
0
votes
1 answer

FSCrawler on Windows _settings.yml, folders/directories and drives

FSCrawler 2.7 on Windows server For a given job eg test1 a _settings.yaml folder is automatically created eg c:\users\jbloggs\.fscrawler\test1\_settings.yml You need to specify where the documents you wish to crawl are located fs: url: "drive &…
JohnC
  • 2,687
  • 1
  • 22
  • 30
0
votes
1 answer

How to connect FSCrawler REST with docker-compose

I've successfully indexed a pdf using FSCrawler but I'm not able to connect to the REST client for FSCrawler to make a pipeline to elasticsearch. This is my command in docker-compose: command: fscrawler fscrawler_rest I'm able to query…
koopmac
  • 936
  • 10
  • 27
0
votes
1 answer

FSCrawler Error while crawling E:\TestFilesToBeIndexed\subfolder: java.net.ConnectException: Connection timed out: connect

Error while crawling path\to\file_folder: java.net.ConnectException: Connection timed out: connect I am trying to ingest the remote server files using FSCrawler into the existing index of Elasticserach(which is on my local machine) but getting above…
Dimple
  • 51
  • 6
0
votes
1 answer

Is it possible to Ingeset file content using FSCrawler to perticular _id of existing index in Elasticsearch

I have already ingested data to the existing Elasticsearch index with _id as one of the column name "mainid" value in database. Now I have another table in that I have two columns "mainid" and path to the files. I want to ingest these files using…
Dimple
  • 51
  • 6
0
votes
1 answer

.eml format data import into elasticsearch

Now I have mails in .eml format that need to be parsed, and then import elasticsearch, through fscrawler, but fscrawler can not scan the sender and recipient information, how can I solve it
0
votes
1 answer

FSCrawler can't find existing jobs

I'm quite new to the Elastic Stack and want to index documents by using FSCrawler. I'm occuring a strange problem: I create a new job and get a confirmation that it had been successfuly created. I can see the newly created folder with the…
0
votes
1 answer

Proper way to upload a doc to FSCrawler for indexing in Elasticsearch

I'm prototyping a Rails application to upload documents to FSCrawler (running the REST interface), to incorporate into an Elasticsearch index. Using their example, this works: response = `curl -F "file=@#{params[:document][:upload].tempfile.path}"…
David Krider
  • 886
  • 12
  • 27
0
votes
1 answer

JVM Settings for elasticsearch and fscrawler

I am using elasticsearch and fscrawler for searching about 7TB of data.. The process starts well until it just stalls after sometime. It must be running out of memory, I am trying to add my heap using…
Denn
  • 447
  • 1
  • 6
  • 27
0
votes
1 answer

How do I map an index created by fscrawler so that I can do exact full text search on the document?

I have an index of binary files created by fscrawler(has a default mapping). I am querying my index using php-elasticsearch: if ($q2 == '') { $params = [ 'index' => 'trial2', 'body' => [ 'query' => [ 'term' => [ …
Denn
  • 447
  • 1
  • 6
  • 27