For everything related to FSCrawler project.
Questions tagged [fscrawler]
40 questions
0
votes
1 answer
Why is Fscrawler refusing to trust certificate while I have set ssl verification to 'false"?
Here's my Yaml file for fscrawler:
name: "data_science"
fs:
url: "C:\\tmp\\DS_books"
update_rate: "15m"
excludes:
- "*/~*"
json_support: false
filename_as_id: false
add_filesize: true
remove_deleted: true
add_as_inner_object:…

ScottyCov
- 21
- 5
0
votes
0 answers
How do I get root CA chain certificate or Elasticsearch server certificate so that I can ingest documents from Fscrawler?
After I have set up Elasticsearch and pasted enrollment token into Kibana, is there a certificate that has been created that I can use to set up for HTTPS connection for Fscrawler or do I need to create a new one using the certutil…

ScottyCov
- 21
- 5
0
votes
1 answer
Accessing google cloud bucket via FS Crawler (elasticsearch)
The project I am currently working on needs a search engine to search a couple of 10.000 pdf files. When the user searches via the website for a certain keyword, the search engine will return a snippet of the pdf files matching his search criteria.…

frankmurphy
- 194
- 1
- 3
- 13
0
votes
1 answer
How to use fscrawler in ubuntu?
Is it possible to use fscrawler in ubuntu? I have used on windows and it works fine. When I try to follow the same implementation on ubuntu I am getting all kind of errors.
First I just tried to pull the docker image and run it according to this…

Sebastian Ramirez
- 3
- 1
- 2
0
votes
0 answers
Run several indexes as a service using fscrawler
I have successfully created an index job using fscrawler and made it run as a service in windows as shown in the documentation:
set JAVA_HOME=c:\Program Files\Java\jdk15.0.1
set FS_JAVA_OPTS=-Xmx2g -Xms2g
/Elastic/fscrawler/bin/fscrawler.bat…

Denn
- 447
- 1
- 6
- 27
0
votes
0 answers
Elasticsearch: Highlight in specific documents based upon file size criteria in index made via FsCrawler?
Currently I am using following search query to highlight stuff based on the query entered. Index is made via FsCrawler.
GET index_name/_search
{
"query": {
"query_string" :{
"query": "my_string_query_here"
}
},
…

josh
- 11
- 2
0
votes
1 answer
FSCrawler on Windows _settings.yml, folders/directories and drives
FSCrawler 2.7 on Windows server
For a given job eg test1 a _settings.yaml folder is automatically created
eg c:\users\jbloggs\.fscrawler\test1\_settings.yml
You need to specify where the documents you wish to crawl are located
fs:
url: "drive &…

JohnC
- 2,687
- 1
- 22
- 30
0
votes
1 answer
How to connect FSCrawler REST with docker-compose
I've successfully indexed a pdf using FSCrawler but I'm not able to connect to the REST client for FSCrawler to make a pipeline to elasticsearch. This is my command in docker-compose:
command: fscrawler fscrawler_rest
I'm able to query…

koopmac
- 936
- 10
- 27
0
votes
1 answer
FSCrawler Error while crawling E:\TestFilesToBeIndexed\subfolder: java.net.ConnectException: Connection timed out: connect
Error while crawling path\to\file_folder: java.net.ConnectException: Connection timed out: connect
I am trying to ingest the remote server files using FSCrawler into the existing index of Elasticserach(which is on my local machine) but getting above…

Dimple
- 51
- 6
0
votes
1 answer
Is it possible to Ingeset file content using FSCrawler to perticular _id of existing index in Elasticsearch
I have already ingested data to the existing Elasticsearch index with _id as one of the column name "mainid" value in database. Now I have another table in that I have two columns "mainid" and path to the files. I want to ingest these files using…

Dimple
- 51
- 6
0
votes
1 answer
.eml format data import into elasticsearch
Now I have mails in .eml format that need to be parsed, and then import elasticsearch, through fscrawler, but fscrawler can not scan the sender and recipient information, how can I solve it
0
votes
1 answer
FSCrawler can't find existing jobs
I'm quite new to the Elastic Stack and want to index documents by using FSCrawler. I'm occuring a strange problem:
I create a new job and get a confirmation that it had been successfuly created. I can see the newly created folder with the…

xTheProgrammer
- 74
- 10
0
votes
1 answer
Proper way to upload a doc to FSCrawler for indexing in Elasticsearch
I'm prototyping a Rails application to upload documents to FSCrawler (running the REST interface), to incorporate into an Elasticsearch index. Using their example, this works:
response = `curl -F "file=@#{params[:document][:upload].tempfile.path}"…

David Krider
- 886
- 12
- 27
0
votes
1 answer
JVM Settings for elasticsearch and fscrawler
I am using elasticsearch and fscrawler for searching about 7TB of data.. The process starts well until it just stalls after sometime. It must be running out of memory, I am trying to add my heap using…

Denn
- 447
- 1
- 6
- 27
0
votes
1 answer
How do I map an index created by fscrawler so that I can do exact full text search on the document?
I have an index of binary files created by fscrawler(has a default mapping).
I am querying my index using php-elasticsearch:
if ($q2 == '') {
$params = [
'index' => 'trial2',
'body' => [
'query' => [
'term' => [
…

Denn
- 447
- 1
- 6
- 27