-3

I'm scraping some information and below is my code

from bs4 import BeautifulSoup
import requests

url = "https://www.privateproperty.com.ng/property-for-sale"
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find_all('div', class_="similar-listings-item sponsored-listing")

for result in results:
    Title = result.find('div', class_= "similar-listings-info").text.replace('\n','')
    location = result.find( class_= "listings-location").text.replace('\n','')
    Price = result.find('div', class_= "similar-listings-price").text.replace('\n','')
    

info = (Title, location, Price)
print(info)

Why does this line

results = soup.find_all('div', class_="similar-listings-item sponsored-listing") 

return only the 1st element?

Adriaan
  • 17,741
  • 7
  • 42
  • 75
  • `Soup.find_all()` returns a list. If you want a list-of-lists, you'll have to process those results and build it yourself. – John Gordon Jan 30 '23 at 02:28
  • Thank you John, please how can I do that? – Franklin Ekwem Jan 30 '23 at 02:37
  • 4
    It sounds like you're asking me to write your program for you... – John Gordon Jan 30 '23 at 02:41
  • Yes John Gordon, please could you help me with better language that enable me get the list of lists of the above function. – Franklin Ekwem Jan 30 '23 at 02:45
  • what's the html you're working with, and what output do you want to extract from it? the question is quite vague, but maybe you want to do something like [this](https://pastebin.com/ZnZ7xM6u) with [list comprehension](https://www.programiz.com/python-programming/list-comprehension) – Driftr95 Jan 30 '23 at 11:14

1 Answers1

0

Why does this line

results = soup.find_all('div', class_="similar-listings-item sponsored-listing") 

return only the 1st element?

I'm getting 2 elements, but maybe you're only seeing the last result because the info=...print(info) lines are after the loop instead of inside it. Indent them to print every the result from inside the loop.


If your issue is that you want all the listings, you should note that only the sponsored listings have the sponsored-listing class. To get all the listings, you can try using

results = soup.find_all('div', {'class': "similar-listings-item"}) ## OR
# results = soup.select('div.similar-listings-item')

[Use soup.select('div.similar-listings-item:not(.sponsored-listing)') if you only want unsponsored listings. Check out how to use .select with CSS selectors for more details.]


I want to extract list of lists from the (variable)

which variable? If you want list of all the Title, location, Price for each result, initiate an empty list [like infoList] before the loop, then indent info=... to include it in the list, and append info to infoList at the end of the loop (but still in the loop). Something like

infoList = []
for result in results:
    Title = result.find('div', class_= "similar-listings-info").text.replace('\n','')
    location = result.find( class_= "listings-location").text.replace('\n','')
    Price = result.find('div', class_= "similar-listings-price").text.replace('\n','')

    info = (Title, location, Price) # this is a tuple btw, so 
    # infoList.append(info) # --> list of tuples
    infoList.append([Title, location, Price]) # --> list of lists
    # print(info) # will print for every result
print(info) # will print ONLY the LAST result

Btw, it's not very safe to chain .find and .text like that. If .find doesn't find any thing, then an error will be raised when trying to get .text. To be more cautious, you should check that find returned something first.

You could use my selectForList function like infoList = [selectForList(result, ['div.similar-listings-info', 'p.listings-location', 'div.similar-listings-price']) for result in results] or [since you want to remove the \ns and also if you don't want to use CSS selectors] use a variation of it:

def get_min_text(containerTag, elName, classAttr, defaultVal=None):
    el = containerTag.find(elName, class_=classAttr)
    if el is None: return defaultVal
    return ' '.join(el.get_text(' ').split()) # split+join minimizes whitespace

results = soup.find_all('div', {'class': "similar-listings-item"}) 
infoList = [[get_min_text(result, *c[:3]) for c in [
    ('div', 'similar-listings-info'), # Title
    ('p', 'listings-location'), # Location
    ('div', 'similar-listings-price') # Price
]] for result in results]

op

Driftr95
  • 4,572
  • 2
  • 9
  • 21