0

I trying to web scrape using requests-html but it returns an error saying there is a missing file even though I pip install requests-html and it said all req fulfilled. how do I get around this.

from requests_html import HTMLSession
import time

url = 'https://soundcloud.com/jujubucks'

s = HTMLSession()
r = s.get(url)

r.html.render()

songs = r.html.xpath('//*[@id="content"]/div/div[4]/div[1]/div/div[2]/div/div[2]', first=True)

print(songs)

this produces a sxstrace error.

OSError: [WinError 14001] The application has failed to start because its side-by-side 
configuration is incorrect. Please see the application event log or use the command-line 
sxstrace.exe tool for more detail

apparently this is the missing file according the event log but I dont know where to get it.

Activation context generation failed for "C:\Users\houst\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32\chrome.exe". Dependent Assembly 71.0.3542.0,language="*",type="win32",version="71.0.3542.0" could not be found. Please use sxstrace.exe for detailed diagnosis.

2 Answers2

1

I came here with the same question, but the only answer didn't apply to me. My win10x64 PC has 5 versions of python, 4 installed via anaconda and python 3.10 installed via the microsoft store. Debugging the process in vscode using the MS store version... with pip install requests-html installed for that version of python only.

VScode stack trace showed that subprocess.py failed to launch a subprocess. Windows event viewer showed a failed attempt to launch chrome.exe in: C:\Users\username\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32

Windows search showed that chrome.exe - which was downloaded and extracted automatically the first time an attempt was made to call response.html.render() - was actually located at: C:\Users\username\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32

As a work around, and although I've no idea why the issue occured, I moved the chrome-win32 directory to the location expected, and found that chrome ran the javascript on the page and returned html correctly.

Barney
  • 11
  • 1
  • 1
    Instead of moving the chrome-win32 directory you could have also created a link pointing to it, using `mkLink /d` . [documentation](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/mklink) – R.S. Jun 03 '22 at 05:09
0

requests_html depends upon pyppeteer but it seems your pypeteer has not installed chromium completely. Try installing chromium manually, just activate your environment containing pyppeteer and run pyppeteer-install.exe.

Faizan AlHassan
  • 389
  • 4
  • 8
  • Hello, this is very late but upon doing this it says, "chromium is already installed.". – TCK Dec 01 '22 at 02:46
  • You should create (or find) an issue `requests_html` repo. I think there is a dependency missing in your OS. – Faizan AlHassan Dec 01 '22 at 10:21
  • 1
    I was using Linux where I constantly ran into issues; but when I switched to Windows I encountered the error. There is no dependency missing actually, instead, what I did to fix it is the answer above yours. For some reason on Windows, requests-html (or, technically, pypputeer) searches for Chromium in a path that doesn’t exist rather than the path it was installed at. Once I created the path it was looking for and put the Chromium files there it worked. – TCK Dec 01 '22 at 14:07
  • Strange issue. But nice to see you figured it out. – Faizan AlHassan Dec 02 '22 at 15:08