keyword argument clashes with variable

Question

Although this is most likely a newbie question I struggled to find any information online to help me with my problem

My code is meant to scrap onion sites, and despite being able to connect to TOR and the web scraper working fine as a stand-alone, when I tried combining both code blocks I kept getting numerous errors regarding the keyword argument in my code, even attempting to delete it presents me with bugs, I am a bit lost on what I'm supposed to do

import socket
import socks
import requests
from pywebcopy import save_webpage

socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket

def get_tor_session():
    session = requests.session()
    # Tor uses the 9050 port as the default socks port
    session.proxies = {'http':  'socks5h://127.0.0.1:9050',
                       'https': 'socks5h://127.0.0.1:9050'}
    return session


session = get_tor_session()
print(session.get("http://httpbin.org/ip").text)
  
kwargs = {'project_name': 'site folder'}

save_webpage(
    
        # url of the website
        
session.get(url="http://elfqv3zjfegus3bgg5d7pv62eqght4h6sl6yjjhe7kjpi2s56bzgk2yd.onion"),
        
    # folder where the copy will be saved            

        project_folder=r"C:\Users\admin\Desktop\WebScraping",
        **kwargs
)

In this case, I'm presented with the following error:

TypeError: Cannot mix str and non-str arguments

attempting to replace

project_folder=r"C:\Users\admin\Desktop\WebScraping",
**kwargs

with

kwargs, 
project_folder=r"C:\Users\admin\Desktop\WebScraping"

presents me with this error:

TypeError: save_webpage() got multiple values for argument

traceback for the first error:

  File "C:\Users\admin\Desktop\WebScraping\tor.py", line 43, in <module>
    **kwargs

  File "C:\Users\admin\anaconda3\lib\site-packages\pywebcopy\api.py", line 58, in save_webpage
    config.setup_config(url, project_folder, project_name, **kwargs)

  File "C:\Users\admin\anaconda3\lib\site-packages\pywebcopy\configs.py", line 189, in setup_config
    SESSION.load_rules_from_url(urljoin(project_url, '/robots.txt'))

  File "C:\Users\admin\anaconda3\lib\urllib\parse.py", line 487, in urljoin
    base, url, _coerce_result = _coerce_args(base, url)

  File "C:\Users\admin\anaconda3\lib\urllib\parse.py", line 120, in _coerce_args
    raise TypeError("Cannot mix str and non-str arguments")

I'd really appreciate an explanation on what causes such a bug and how to avoid it in the future

Welcome to SO. You use ```**kwargs``` when you define a function; not when you use it. — ewokx, Feb 07 '22 at 02:07
Looking at the example usage for `pywebcopy`, the function `save_webpage` expects a string for the `url` keyword parameter, not a `Response` object. Why are you making an HTTP GET request there? That's probably where the initial type error is coming from. — Paul M., Feb 07 '22 at 02:11
Its great that you posted the error, but post the full traceback message so we can easily spot the failing line. — tdelaney, Feb 07 '22 at 03:02
@ewong - you can expand dictionaries into keyword arguments when calling a function. — tdelaney, Feb 07 '22 at 03:09
Apologies, I missed that, i edited to add the traceback for the first error. — AanTuning, Feb 07 '22 at 03:12
The problem is in pywebcopy - I'm not familiar with that code, but it seems like `save_webpage` wants a url (string) as first paramter, but you are doing a `session.get` which returns a response object. This confuses urllib which is expecting a string. — tdelaney, Feb 07 '22 at 03:23
What module do you recommend I use instead of pywebcopy, beautiful Soap maybe? @tdelaney — AanTuning, Feb 07 '22 at 03:26
Not sure. The docs for pywebcopy are at https://pypi.org/project/pywebcopy/. I think the first step is to make sure you are using it right. I haven't used it but if you look at `1.5 Authentication and Cookies` it seems like you want to configure its session instead of using your own requests.session. — tdelaney, Feb 07 '22 at 03:31
I have read the documentation, correct me if I'm wrong but doesn't Authentication aid in scrapping websites that require it? This is certainly useful but I don't believe it has any relation to my issue, which I suspect is caused by misconfiguration between the keyword argument and the code @tdelaney — AanTuning, Feb 07 '22 at 03:39
You are setting up a requests session to handle proxies, but I seems from the documentation that you want to configure its session info instead of trying to pass in your own. That section was an example for authentication but it may be a hint about how to configure proxies. — tdelaney, Feb 07 '22 at 03:43

score 0 · Answer 1 · answered Feb 07 '22 at 16:56

0

Not sure why this hasn't been answered yet. As mentioned in my comment, simply change this:

save_webpage(
    # url of the website
    session.get(url=...),

    # folder where the copy will be saved            
    project_folder=r"C:\Users\admin\Desktop\WebScraping",
    **kwargs
)

To:

save_webpage(
    # url of the website
    url=...,

    # folder where the copy will be saved            
    project_folder=r"C:\Users\admin\Desktop\WebScraping",
    **kwargs
)

save_webpage makes the request internally.

answered Feb 07 '22 at 16:56

Paul M.

10,481
2
9
15

In this case it simply wouldn't connect to the onion site as it's not routing through a SOCKS5 proxy – AanTuning Feb 07 '22 at 17:12
In that case I would ditch `pywebcopy` and go purely with `requests` and `BeautifulSoup` – Paul M. Feb 07 '22 at 17:19
i was thinking of doing this, but the link I'm trying to scrap contains JS, can BeautifulSoup scrap JS? @Paul M. – AanTuning Feb 07 '22 at 17:53
@AanTuning Sounds like you may need to use Selenium or Playwright. – Paul M. Feb 07 '22 at 17:58

score -2 · Answer 2 · answered Feb 07 '22 at 13:14

-2

SOLVED

adding the following code resolved the issue:

def getaddrinfo(*args):
    return [(socket.AF_INET, socket.SOCK_STREAM, 6, '', (args[0], args[1]))]

socket.getaddrinfo = getaddrinfo

answered Feb 07 '22 at 13:14

AanTuning

5
2

would you be able to explain a bit further the answer? – user7440787 Feb 07 '22 at 14:26
Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 07 '22 at 14:27

keyword argument clashes with variable

2 Answers2