I'm trying to scrape a wiki table in Python from within RStudio (in Rmarkdown) via reticulate
. I can't manage to do it with R (tried rvest
but the columns end up being misaligned and I can't figure out exactly why) which is why I'm using Python: I have a r-reticulate
Conda env and installed BeautifulSoup
and requests
.
The code I've written runs flawlessly in my Jupyter notebook running the r-reticulate
kernel.
However, when I try to run it in RStudio, I get an ImportError
saying lxml
was not found. Which can't be, because it is there as you can see at the bottom with conda list
(and as evidenced by my working notebook).
Here is my full code:
```{r libraries, include=FALSE}
library(reticulate)
use_condaenv("r-reticulate", required = TRUE)
```
```{python results="hide"}
import pandas as pd
import requests
from bs4 import BeautifulSoup
```
```{python}
url = "https://en.wikipedia.org/wiki/COVID-19_lockdowns"
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
table = soup.find("table", {"class": "wikitable"})
dfs = pd.read_html(str(table)) # this is the line that generates the error
df = dfs[0]
df.head(20)
```
This is the error output from the last chunk:
ImportError: lxml not found, please install it
Detailed traceback:
File "<string>", line 1, in <module>
File "C:\PROGRA~3\ANACON~1\envs\R-RETI~1\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "C:\PROGRA~3\ANACON~1\envs\R-RETI~1\lib\site-packages\pandas\io\html.py", line 1100, in read_html
displayed_only=displayed_only,
File "C:\PROGRA~3\ANACON~1\envs\R-RETI~1\lib\site-packages\pandas\io\html.py", line 889, in _parse
parser = _parser_dispatch(flav)
File "C:\PROGRA~3\ANACON~1\envs\R-RETI~1\lib\site-packages\pandas\io\html.py", line 846, in _parser_dispatch
raise ImportError("lxml not found, please install it")
The env name is truncated (R-RETI~1
) but I don't have any other env starting with this name, so I'm sure that it is the correct env. py_config()
also shows that it is the correct env being used. I don't understand what is going on, or which component is not behaving correctly (is it coming from reticulate?)...
python: C:/ProgramData/Anaconda3/envs/r-reticulate/python.exe
libpython: C:/ProgramData/Anaconda3/envs/r-reticulate/python37.dll
pythonhome: C:/ProgramData/Anaconda3/envs/r-reticulate
version: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:37:01) [MSC v.1916 64 bit (AMD64)]
Architecture: 64bit
numpy: C:/ProgramData/Anaconda3/envs/r-reticulate/Lib/site-packages/numpy
numpy_version: 1.20.1
NOTE: Python version was forced by use_python function
Output of conda list
:
(r-reticulate) C:\[...]>conda list
# packages in environment at C:\ProgramData\Anaconda3\envs\r-reticulate:
#
# Name Version Build Channel
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.1 py_0 conda-forge
beautifulsoup4 4.9.3 pyhb0f4dca_0 conda-forge
brotlipy 0.7.0 py37hcc03f2d_1001 conda-forge
bs4 4.9.3 0 conda-forge
ca-certificates 2020.12.5 h5b45459_0 conda-forge
certifi 2020.12.5 py37h03978a9_1 conda-forge
cffi 1.14.5 py37hd8e9650_0 conda-forge
chardet 4.0.0 py37h03978a9_1 conda-forge
colorama 0.4.4 pyh9f0ad1d_0 conda-forge
cryptography 3.4.6 py37h20c650d_0 conda-forge
cycler 0.10.0 py_2 conda-forge
decorator 4.4.2 py_0 conda-forge
freetype 2.10.4 h546665d_1 conda-forge
icu 68.1 h0e60522_0 conda-forge
idna 2.10 pyh9f0ad1d_0 conda-forge
intel-openmp 2020.3 h57928b3_311 conda-forge
ipykernel 5.5.0 py37heaed05f_1 conda-forge
ipython 7.21.0 py37heaed05f_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
jedi 0.18.0 py37h03978a9_2 conda-forge
jpeg 9d h8ffe710_0 conda-forge
jupyter_client 6.1.12 pyhd8ed1ab_0 conda-forge
jupyter_core 4.7.1 py37h03978a9_0 conda-forge
kiwisolver 1.3.1 py37h8c56517_1 conda-forge
lcms2 2.12 h2a16943_0 conda-forge
libblas 3.9.0 8_mkl conda-forge
libcblas 3.9.0 8_mkl conda-forge
libclang 11.1.0 default_h5c34c98_0 conda-forge
libiconv 1.16 he774522_0 conda-forge
liblapack 3.9.0 8_mkl conda-forge
libpng 1.6.37 h1d00b33_2 conda-forge
libsodium 1.0.18 h8d14728_1 conda-forge
libtiff 4.2.0 hc10be44_0 conda-forge
libxml2 2.9.10 hf5bbc77_3 conda-forge
libxslt 1.1.33 h65864e5_2 conda-forge
lxml 4.6.2 py37hd07aab1_1 conda-forge
lz4-c 1.9.3 h8ffe710_0 conda-forge
m2w64-gcc-libgfortran 5.3.0 6 conda-forge
m2w64-gcc-libs 5.3.0 7 conda-forge
m2w64-gcc-libs-core 5.3.0 7 conda-forge
m2w64-gmp 6.1.0 2 conda-forge
m2w64-libwinpthread-git 5.0.0.4634.697f757 2 conda-forge
matplotlib 3.3.4 py37h03978a9_0 conda-forge
matplotlib-base 3.3.4 py37h3379fd5_0 conda-forge
mkl 2020.4 hb70f87d_311 conda-forge
msys2-conda-epoch 20160418 1 conda-forge
numpy 1.20.1 py37hd20adf4_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openssl 1.1.1j h8ffe710_0 conda-forge
pandas 1.2.3 py37h08fd248_0 conda-forge
parso 0.8.1 pyhd8ed1ab_0 conda-forge
patsy 0.5.1 py_0 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 8.1.2 py37h96663a1_0 conda-forge
pip 21.0.1 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.17 pyha770c72_0 conda-forge
pycparser 2.20 pyh9f0ad1d_2 conda-forge
pygments 2.8.1 pyhd8ed1ab_0 conda-forge
pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyqt 5.12.3 py37h03978a9_7 conda-forge
pyqt-impl 5.12.3 py37hf2a7229_7 conda-forge
pyqt5-sip 4.19.18 py37hf2a7229_7 conda-forge
pyqtchart 5.12 py37hf2a7229_7 conda-forge
pyqtwebengine 5.12.1 py37hf2a7229_7 conda-forge
pysocks 1.7.1 py37h03978a9_3 conda-forge
python 3.7.10 h7840368_100_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.7 1_cp37m conda-forge
pytz 2021.1 pyhd8ed1ab_0 conda-forge
pywin32 300 py37hcc03f2d_0 conda-forge
pyzmq 22.0.3 py37hcce574b_1 conda-forge
qt 5.12.9 h5909a2a_4 conda-forge
requests 2.25.1 pyhd3deb0d_0 conda-forge
scipy 1.6.0 py37h6db1a17_0 conda-forge
seaborn 0.11.1 hd8ed1ab_1 conda-forge
seaborn-base 0.11.1 pyhd8ed1ab_1 conda-forge
setuptools 49.6.0 py37h03978a9_3 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
soupsieve 2.0.1 py_1 conda-forge
sqlite 3.34.0 h8ffe710_0 conda-forge
statsmodels 0.12.2 py37hda49f71_0 conda-forge
tk 8.6.10 h8ffe710_1 conda-forge
tornado 6.1 py37hcc03f2d_1 conda-forge
traitlets 5.0.5 py_0 conda-forge
urllib3 1.26.3 pyhd8ed1ab_0 conda-forge
vc 14.2 hb210afc_4 conda-forge
vs2015_runtime 14.28.29325 h5e1d092_4 conda-forge
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
wheel 0.36.2 pyhd3deb0d_0 conda-forge
win_inet_pton 1.1.0 py37h03978a9_2 conda-forge
wincertstore 0.2 py37h03978a9_1006 conda-forge
xz 5.2.5 h62dcd97_1 conda-forge
zeromq 4.3.4 h0e60522_0 conda-forge
zlib 1.2.11 h62dcd97_1010 conda-forge
zstd 1.4.9 h6255e5f_0 conda-forge
EDIT: For reasons unknown and without doing anything, it now works. The system probably needed one more reboot I guess...