Although this is most likely a newbie question I struggled to find any information online to help me with my problem
My code is meant to scrap onion sites, and despite being able to connect to TOR and the web scraper working fine as a stand-alone, when I tried combining both code blocks I kept getting numerous errors regarding the keyword argument in my code, even attempting to delete it presents me with bugs, I am a bit lost on what I'm supposed to do
import socket
import socks
import requests
from pywebcopy import save_webpage
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket
def get_tor_session():
session = requests.session()
# Tor uses the 9050 port as the default socks port
session.proxies = {'http': 'socks5h://127.0.0.1:9050',
'https': 'socks5h://127.0.0.1:9050'}
return session
session = get_tor_session()
print(session.get("http://httpbin.org/ip").text)
kwargs = {'project_name': 'site folder'}
save_webpage(
# url of the website
session.get(url="http://elfqv3zjfegus3bgg5d7pv62eqght4h6sl6yjjhe7kjpi2s56bzgk2yd.onion"),
# folder where the copy will be saved
project_folder=r"C:\Users\admin\Desktop\WebScraping",
**kwargs
)
In this case, I'm presented with the following error:
TypeError: Cannot mix str and non-str arguments
attempting to replace
project_folder=r"C:\Users\admin\Desktop\WebScraping",
**kwargs
with
kwargs,
project_folder=r"C:\Users\admin\Desktop\WebScraping"
presents me with this error:
TypeError: save_webpage() got multiple values for argument
traceback for the first error:
File "C:\Users\admin\Desktop\WebScraping\tor.py", line 43, in <module>
**kwargs
File "C:\Users\admin\anaconda3\lib\site-packages\pywebcopy\api.py", line 58, in save_webpage
config.setup_config(url, project_folder, project_name, **kwargs)
File "C:\Users\admin\anaconda3\lib\site-packages\pywebcopy\configs.py", line 189, in setup_config
SESSION.load_rules_from_url(urljoin(project_url, '/robots.txt'))
File "C:\Users\admin\anaconda3\lib\urllib\parse.py", line 487, in urljoin
base, url, _coerce_result = _coerce_args(base, url)
File "C:\Users\admin\anaconda3\lib\urllib\parse.py", line 120, in _coerce_args
raise TypeError("Cannot mix str and non-str arguments")
I'd really appreciate an explanation on what causes such a bug and how to avoid it in the future