I used Scrapy Cluster to solve the problem and I'm sharing my experience:
Docker installation was hard for me to control and debug, so I tried the Cluster Quick-start and it worked better.
I have five machines available in my cluster and I used one to host the Apache Kafka, as well as the Zookeeper. I also had one for Redis DB. It's important to make sure those machines are available for external access from the ones you are going to use for spidering.
Once these three components were properly installed and running, I installed Scrapy Cluster's requirements in a python3.6 environment. Then, I configured a local settings file with the IP address for the hosts and made sure all online and offline tests passed.
Everything set up, I was able to run the first spider (the official documentation provides an example). The idea is that you create instances for your spider (you can, for example, use tmux to open 10 different terminal windows and run one instance at each). When you feed Apache Kafka with a URL to be crawled, it's sent to a queue at Redis, to where your instances will periodically look for a new page to crawl.
If your spider generates more URLs from the one you passed initially, they return to Redis to be possibly crawled by other instances. And that's where you can see the power of this distribution.
Once a page is crawled, the result is sent to a Kafka topic.
The official documentation is huge and you can find more details on the installation and setup.