-4

How to convert a string in list into url ? I try url.parse, but it didn't work.

!pip install selenium
from urllib.parse import urlparse
from urllib.parse   import quote
from urllib.request import urlopen
import time

browser = webdriver.Chrome(executable_path='./chromedriver.exe')
wait = WebDriverWait(browser,5)
output = []
for i in range(1,2): # Iterate from page 1 to the last page
    browser.get("https://tw.mall.yahoo.com/search/product?p=%E5%B1%88%E8%87%A3%E6%B0%8F&pg={}".format(i))
    
 wait.until(EC.presence_of_element_located((By.XPATH,"//ul[@class='gridList']")))


    product_links = browser.find_elements(By.XPATH,"//ul[@class='gridList']/li/a")
    
     
    for link in (product_links):
        print(f"{link.get_attribute('href')}")
        output.append([link.get_attribute('href')])


for b in output[:3]:
    print(b)

The total code above, I try to make the string into url. But it doesn't work.

  • Where is `url` defined? – James Nov 26 '21 at 15:40
  • There is no `url` module in the standard library. There is a `urllib`. Its a package with a parsing submodule. `import urllib.parse` then using `urllib.parse.urlparse` works. Also, `b` is assigned a list with 1 element. You'll want to reference it as `b[0]`. Those other two lists aren't assigned to any variable, they will just be discarded as soon as they are cresated. – tdelaney Nov 26 '21 at 15:51
  • your snippet is still not reproducible and very divergent from your original snippet. – Tibebes. M Nov 26 '21 at 15:52
  • 1
    Wait a second, that's a completely different script and it doesn't even use `url.parse`. That invalidates all of the answers so far. – tdelaney Nov 26 '21 at 15:53

4 Answers4

1

I think what you're trying to do is :

// importing library 
from urllib.parse import urlparse

// putting the link in a list 
b = [['https://tw.mall.yahoo.com/item/p033088522688']
['https://tw.mall.yahoo.com/item/p0330103147501']
 ['https://tw.mall.yahoo.com/item/p033097510324']]

// going through each element of the list and parse them 
for i in range ( len (b)) : 
     print(urlparse(b[i]))
Madar
  • 196
  • 2
  • 15
1

That's not a list of urls, you can define that list like that:

b = ['https://tw.mall.yahoo.com/item/p033088522688', 'https://tw.mall.yahoo.com/item/p0330103147501', 'https://tw.mall.yahoo.com/item/p033097510324']

after that you need to iterate over the list to get each string using a for loop, then inside the loop you can parse the string as an url. Of course you first need to import the urlparse package.

from urlparse import urlparse

b = ['https://tw.mall.yahoo.com/item/p033088522688', 'https://tw.mall.yahoo.com/item/p0330103147501', 'https://tw.mall.yahoo.com/item/p033097510324']

for el in b:
    parsedUrl = urlparse(el)
    # do something with parsedUrl

You can find more about the urlparse lib here: https://pymotw.com/2/urlparse/

J0ker98
  • 447
  • 5
  • 18
0

Well, at first:

NameError: name 'url' is not defined

This error is being thrown because either you're not importing the correct library or there is no object with the name url.

Your variable b is a list of strings, from which you can access any element using b[index] where index is the position of the string in the list (e.g. b[0] results in https://tw.mall.yahoo.com/item/p033088522688 etc).

nTerior
  • 159
  • 1
  • 9
0

In python you define lists by putting a list of things between square brackets. Where you have gone wrong with this is that rather than list the urls seperated by commas you have defined them each as lists.

The reason for your current error (as eluded to in the comments) is likly due to not importing URL.

import urllib as url
urls = ['https://tw.mall.yahoo.com/item/p033088522688','https://tw.mall.yahoo.com/item/p0330103147501','https://tw.mall.yahoo.com/item/p033097510324']
for url_string in urls:
    print(url.parse(url_string))
jhylands
  • 984
  • 8
  • 16