UnstructuredURLLoader/SeleniumURLLoader not working in LangChain for JS based websites

Question

I try to use document loader for websites urls. However for UnstructuredURLLoader some websites return:

(Document(page_content='Please enable JS and disable any ad blocker',
 metadata={'source': 'https://wellfound.com/company/chorus-one'})

So I wanted to use SeleniumURLLoader that is advertised in the doc to overcome the issues.

However, after installing pip install selenium webdriver_manager.

 selenium                      4.10.0
 webdriver-manager             3.8.6

from langchain.document_loaders import UnstructuredURLLoader, SeleniumURLLoader

loaders = SeleniumURLLoader(urls=urls)
data = loaders.load()

I keep getting errors:

The version of chrome cannot be detected. Trying with latest driver version

WebDriverException: Message: unknown error: cannot find Chrome binary Stacktrace: #0 0x55c7d5ee44e3 #1 0x55c7d5c13c76 #2 0x55c7d5c3a757 #3 0x55c7d5c39029 #4 0x55c7d5c77ccc #5 0x55c7d5c7747f #6 0x55c7d5c6ede3 #7 0x55c7d5c442dd #8 0x55c7d5c4534e #9 0x55c7d5ea43e4 #10 0x55c7d5ea83d7 #11 0x55c7d5eb2b20 #12 0x55c7d5ea9023 #13 0x55c7d5e771aa #14 0x55c7d5ecd6b8 #15 0x55c7d5ecd847 #16 0x55c7d5edd243 #17 0x7fbddad0d609 start_thread

What am I doing wrong?

Perhaps you should verify whether your URLs are directed towards any of the following non-HTML file types: `jpg`, `jpeg`, `JPG`, `JPEG`, `png`, `PNG`, `svg`, `gif`, `GIF`, `ttf`, `woff`, `js`, `json`, `css`, `css2`, `ico`, `xml`, `mp3`, `mp4`, `php`, `rdf`, `axd`, `eot`, `pdf`, `doc`, `docx`, `xlsx`. If that is indeed the case, you must eliminate such URLs, as these files cannot be processed using either `UnstructuredURLLoader` or `SeleniumURLLoader`. — Carlos Luis Rivera, Jul 20 '23 at 13:31

Jason · Answer 1 · 2023-07-12T16:56:18.660

0

You will need to install chromium sudo apt-get install chromium

edited Jul 12 '23 at 16:56

answered Jul 10 '23 at 03:51

Jason

676
1
12
34

@Jason Then how to properly install chromium in this situation? – mCs Jul 12 '23 at 08:59
@mCs which OS are you working on? – Jason Jul 12 '23 at 16:56
I use MacOs but I do have the Chromium browser installed. – mCs Jul 16 '23 at 10:11

UnstructuredURLLoader/SeleniumURLLoader not working in LangChain for JS based websites

1 Answers1