4

I am trying to simply get the price for the security shown at https://investor.vanguard.com/529-plan/profile/4514 . I run this code:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Firefox(executable_path=r'C:\Program_Files_EllieTheGoodDog\Geckodriver\geckodriver.exe')
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

When I "inspect element" the price in the selenium-opened Firefox, I clearly see this:

<span data-ng-if="!data.isLayer" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" class="ng-scope ng-binding arrange">$42.91</span >

But that data is NOT in my soup. If I print my soup, the html is really quite different from that shown on the website. I tried this, but it totally fails:

myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})

I am totally stumped. If anyone could point me in the right direction, I would really appreciate it. I sense I am totally missing something, possible several things...

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • The `data-*` values are accessible via `dataset` https://developer.mozilla.org/en/docs/Web/API/HTMLElement/dataset – IdiakosE Sunday Feb 16 '19 at 06:02
  • Apologies, but I don't understand what that means. I am sure this is just another indication that I don't know what I am doing! But thank you. – Ellie The Good Dog Feb 16 '19 at 06:34
  • Not really, just that attributes starting with `data-` are accessible via `dataset[]`. For instance `` is accesible via `document.querySelector('input#ease').getAttribute('dataset')[value]` – IdiakosE Sunday Feb 16 '19 at 09:47

2 Answers2

2

There is nothing wrong in the way you are using the data_* attributes and values to select the span. In fact it is the correct method as mentioned in the documentation.There are 4 span tags that match all the attributes. find_all will return all of those tags. The second one corresponds to the price.

What you missed out on is that the span takes some time to be loaded and the page source is returned before that. You can explicitly wait for that span and then get the page source. Here i am using Xpath to wait for the element. You can get the xpath by going to the inspect tool -> right click element -> copy -> copy xpath

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH ,'/html/body/div[1]/div[3]/div[3]/div[1]/div/div[1]/div/div/div/div[2]/div/div[3]/div[1]/div/div/table/tbody/tr[1]/td[2]/div/span[1]')))
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})
print(myspan)
print(myspan[1].text)

Output

[<span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Unit price as of 02/15/2019</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">$42.91</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Change</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer"><span class="number-positive">$0.47</span> <span class="number-positive">1.11%</span></span>]
$42.91
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Bitto
  • 7,937
  • 1
  • 16
  • 38
1

Selenium alone can can be sufficient to extract the desired text. You need to induce WebDriverWait for the visibility_of_element_located and you can use the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
    driver.get('https://investor.vanguard.com/529-plan/profile/4514')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='ng-scope']//td[@class='ng-scope right']//span[@class='ng-scope ng-binding arrange' and @data-ng-bind-html]"))).get_attribute("innerHTML"))
    
  • Console Output:

    $42.91
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352