When fscrawler is run for the fist time (i.e., docker-compose run fscrawler
) it creates /config/{fscrawer_job}/_settings.yml
with the following default setting:
elasticsearch:
nodes:
- url: "http://127.0.0.1:9200"
This will cause fscrawler to attempt to connect to localhost (i.e., 127.0.0.1). However, this will fail when fscrawler is located within a docker container because it is attempting to connect with the localhost of the CONTAINER. This was particularly confusing in my case because elasticsearch WAS accessible as localhost, but on the localhost of my physical computer (and NOT localhost of the container). Changing the url allowed fscrawler to connect to network address where elasticsearch actually resides.
elasticsearch:
nodes:
- url: "http://elasticsearch:9200"
I used the following docker image: https://hub.docker.com/r/toto1310/fscrawler
# FILE: docker-compose.yml
version: '2.2'
services:
# FSCrawler
fscrawler:
image: toto1310/fscrawler
container_name: fscrawler
volumes:
- ${PWD}/config:/root/.fscrawler
- ${PWD}/data:/tmp/es
networks:
- esnet
command: fscrawler job_name
# Elasticsearch Cluster
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.3.2
container_name: elasticsearch
environment:
- node.name=elasticsearch
- discovery.seed_hosts=elasticsearch2
- cluster.initial_master_nodes=elasticsearch,elasticsearch2
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata01:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- esnet
elasticsearch2:
image: docker.elastic.co/elasticsearch/elasticsearch:7.3.2
container_name: elasticsearch2
environment:
- node.name=elasticsearch2
- discovery.seed_hosts=elasticsearch
- cluster.initial_master_nodes=elasticsearch,elasticsearch2
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata02:/usr/share/elasticsearch/data
networks:
- esnet
volumes:
esdata01:
driver: local
esdata02:
driver: local
networks:
esnet:
Ran docker-compose up elasticsearch elasticsearch2
to bring up elasticsearch nodes.
Ran docker-compose run fscrawler
to create _settings.yml
Edited _settings.yml
to
elasticsearch:
nodes:
- url: "http://elasticsearch:9200"
Started fscrawler docker-compose up fscrawler