0

I've been struggling on this issue for a while now. I am trying to build a Docker container that scrape some data with selenium Webdriver and I got an error saying the driver is no callable. Check:

 > [stage-1 6/6] RUN python db_starter.py:
   #10 35.99 Traceback (most recent call last):
   #10 35.99   File "db_starter.py", line 3, in <module>
   #10 35.99     run_backend.update_db()
   #10 35.99   File "/app/run_backend.py", line 11, in update_db
   #10 35.99     search_page = donwload_search_page(query, page)
   #10 35.99   File "/app/get_data.py", line 19, in donwload_search_page
   #10 35.99     soup = BeautifulSoup(html, 'html.parser')
   #10 35.99 TypeError: 'module' object is not callable

Here is my Dockerfile, I tried either with Chrome and Firefox and the error is the same:

FROM scrapinghub/scrapinghub-stack-scrapy:1.3
from python:3.7-slim
COPY . /app
WORKDIR /app

RUN apt-get update                             \
&& apt-get install -y --no-install-recommends \
ca-certificates curl firefox-esr           \
&& rm -fr /var/lib/apt/lists/*                \
&& curl -L https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-linux64.tar.gz | tar xz -C /usr/local/bin \
&& apt-get purge -y ca-certificates curl


RUN pip install --no-cache-dir -r requirements.txt
RUN python db_starter.py

And here is where the code is crashing:

import requests as rq
import bs4 as BeautifulSoup
import time
import os
from selenium import webdriver

def donwload_search_page(query, page):
    options = webdriver.FirefoxOptions()
    options.add_argument("--window-size 1920,1080")
    options.add_argument("--headless")
    driver = webdriver.Firefox(options=options)
    url = "https://www.amazon.com/s?k={query}&page={page}".format(query = query, page = page)
    driver.get(url)
    html = driver.page_source
    soup = BeautifulSoup(html, 'html.parser')
    driver.close()
    time.sleep(2)

    return soup.text

I really don't get why it says the module is not callable, I ran the code in my machine, in a jupyter notebook with geckodriver in the folder and it works, when used to try to build a container, it returns this error.

Can any of you help me on this one?

Thank you!

Gustavo Rottgering
  • 511
  • 1
  • 4
  • 11

1 Answers1

0

I found the error. It was a beginner's mistake.

import bs4 as Beautifulsoup.

Should have been

from bs4 import BeautifulSoup.

Thanks those that checked.

arun
  • 10,685
  • 6
  • 59
  • 81
Gustavo Rottgering
  • 511
  • 1
  • 4
  • 11