Python phantomjs loading webpage not correct

Question

I have an issue where extracting from this link

http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=FA&sub_category=FA1&alphabetical=All&company=5250

brings me data from this link instead which is the main page itself. http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=all

Any idea why is this occuring ? I am using PhantomJS selenium and beautiful soup to assit me in this.

# The standard library modules
import os
import sys
import re
import sqlite3
import locale
# The wget module
import wget
import time
import calendar
from datetime import datetime
# The BeautifulSoup module
from bs4 import BeautifulSoup

# The selenium module
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


def getURLS(url):
    driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true'])
    driver.get(url) # load the web page
    src = driver.page_source
    #Get text and split it
    soup = BeautifulSoup(src, 'html5lib')

    print soup

link ='http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=FA&sub_category=FA1&alphabetical=All&company=5250'
getURLS(link)

Solution from Alex Lucaci

def getURLS(url):
    driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true'])
    driver.get(url) # load the web page
    src = driver.page_source
    category_select = Select(driver.find_element_by_xpath('//*[@id="bm_announcement_types"]'))
    category_select.select_by_visible_text("Financial Results")
    category_select2 = Select(driver.find_element_by_xpath('//*[@id="bm_sub_announcement_types"]'))
    category_select2.select_by_visible_text("Financial Results")
    category_select3 = Select(driver.find_element_by_xpath('//*[@id="bm_company_list"]'))
    category_select3.select_by_visible_text("7-ELEVEN MALAYSIA HOLDINGS BERHAD (5250)")
    driver.find_element_by_xpath('//*[@id="bm_company_announcements_search_form"]/input[1]').click()
    src = driver.page_source
    soup = BeautifulSoup(src, 'html5lib')
    link="http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=all"
    getURLS(link)

Alex Lucaci · Accepted Answer · 2017-07-07T09:40:57.970

When you are saving the source the page is not completely loaded with your submitted post so try to wait for a couple of second before fetching the page source:

def getURLS(url):
driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true'])
driver.get(url) # load the web page
time.sleep(5)# waiting for 5 seconds before fetching the source
src = driver.page_source
#Get text and split it
soup = BeautifulSoup(src, 'html5lib')

print soup

To perform dropdown select you have import the Select class as follow : from selenium.webdriver.support.ui import Select and then you have to select the dropdown element like that:

category_select = Select(driver.find_element_by_xpath('//*[@id="bm_announcement_types"]'))
category_select.select_by_visible_text('Financial Results')

In my example I've done it for the -Category- dropdown, follow the exact steps for every category. Note that selecting the dropdown by xpath is the best way and you can achieve this by using Google Chrome -> righ click on the element -> Inspect-> right click on the <select> in the right menu that appeared -> Copy -> Copy Xpath

When you`ve selected all the element you have to click the Submit and wait for a couple of seconds to load and after that you will fetch the source code.

Let me know if my answer helped you.

hey , sorry mate. i tested it and printed the soup out. it's still displaying data from the main page instead of the filtered values. is there any way to make it go through it's drop down select values instead? — Napmi, Jul 07 '17 at 09:32
thanks mate ! abit of googling on how to hit submit and your new found knowledge on me for find_element_by_xpath and it works! thank you so much! — Napmi, Jul 11 '17 at 08:43

Python phantomjs loading webpage not correct

1 Answers1

Linked