0

I don't really know what to call this issue, sorry for the undescriptive title. My program checks if a element exists on multiple paths of a website. The program has a base url that gets different paths of the domain to check, which are located in a json file (name.json). In this current state of my program, it prints 1 if the element is found and 2 if not. I want it to print the url instead of 1 or 2. But my problem is that the id's gets saved before the final for loop. When trying to print fullurl I'm only getting the last id in my json file printed multiple times(because it isnt being saved), instead of the unique url.

import json
import grequests
from bs4 import BeautifulSoup

idlist = json.loads(open('name.json').read())

baseurl = 'https://steamcommunity.com/id/'


complete_urls = []

for uid in idlist:
    fullurl = baseurl + uid
    complete_urls.append(fullurl)

rs = (grequests.get(fullurl) for fullurl in complete_urls)
resp = grequests.map(rs)

for r in resp:
    soup = BeautifulSoup(r.text, 'lxml')

    if soup.find('span', class_='actual_persona_name'):
        print('1')

    else:
        print('2')
  • List `complete_urls` contains all of your URLs. The variable `fullurl` containers only the most recent one. – DYZ Aug 16 '20 at 22:13

2 Answers2

0

Since the grequests.map return the responses in order of requests (see this), you can match the fullurl of each request to a response using enumerate.

import json
import grequests
from bs4 import BeautifulSoup

idlist = json.loads(open('name.json').read())

baseurl = 'https://steamcommunity.com/id/'

for uid in idlist:
    fullurl = baseurl + uid

complete_urls = []

for uid in idlist:
    fullurl = baseurl + uid
    complete_urls.append(fullurl)

rs = (grequests.get(fullurl) for fullurl in complete_urls)
resp = grequests.map(rs)

for index,r in enumerate(resp): # use enumerate to get the index of response
    soup = BeautifulSoup(r.text, 'lxml')
    print(complete_urls[index]) # using the index of responses to access the already existing list of complete_urls
    if soup.find('span', class_='actual_persona_name'):
        print('1')

    else:
        print('2')
idan
  • 1
  • 1
0

If I undertstood correctly you could just print(r.url) instead of the numbers since the fullurl is stored inside each response object.

for r in resp:
    soup = BeautifulSoup(r.text, 'lxml')

    if soup.find('span', class_='actual_persona_name'):
        print(r.url)

    else:
        print(r.url)
Lucas Godoy
  • 792
  • 8
  • 17