3

Please note - I'm very unexperienced and this is my first 'real' project.

I'm going to try to explain my problem as best as I can, apologies if some of the terms are incorrect.

I'm trying to scrape the following webpage - https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO

I can scrape the 'Name' and 'Status', but I also need to get some of the information in the 'Full Details' popup window.

I have noticed that when clicking on the 'Full Details' button the URL stays the same.

Below is what my code looks like:

import scrapy
from FirstScrape.items import FirstscrapeItem

class FirstSpider(scrapy.Spider):
    name = "spiderman"
    start_urls = [
        
        "https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO"
        
        ]
    
    def parse(self, response):
        item = FirstscrapeItem()
        item['name'] = response.xpath("//tr[@class='even']/td[1]/text()").get()
        item['status'] = response.xpath("//tr[@class='even']/td[2]/text()").get()
        #first refers to firstname in the popup window
        item['first'] = response.xpath("//div[@class='result-list default']/tbody/tr[2]/td[2]/text()").get()
        
        
        return item

I launch my code from the terminal and export it to a .csv file.

Not sure if this will help but this is the popup / fancy box window:

popup window

Do I need to use Selenium to click on the button or am I just missing something? Any help will be appreciated.

I'm very eager to learn more about Python and scraping.

Thank you.

2 Answers2

2

This is the URL you need to extract from your starting page:

<a href="/listing_detail.php?agents_id=169039" class="agent-detail">Full Detail</a>

To get the content of pop-up-window open this extracted URL as another request.

Lukas Poustka
  • 91
  • 1
  • 3
  • 5
2

In the Full Detail you have the href attribute you need to get this url and make requests. Maybe it helps you:

import scrapy
from scrapy.crawler import CrawlerProcess

class FirstSpider(scrapy.Spider):
    name = "spiderman"
    start_urls = [
        
        "https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO"
        
        ]
    
    def parse(self, response):
                
        all_urls = [i.attrib["href"] for i in response.css(".agent-detail")]
        for url in all_urls:
            yield scrapy.Request(url=f"https://www.eaab.org.za{url}", callback=self.parse_data)
        
    def parse_data(self, response):
        print(response.css("td::text").extract())
        print("-----------------------------------")
dimay
  • 2,768
  • 1
  • 13
  • 22