2

I'm struggling with scraping a table (from steamcommunity) that is dynamically loaded through js. I'm using a combination of python Splinter and headless browser Phantomjs.

Here is what I already came up with:

from splinter import Browser
import time
import sys

browser = Browser('phantomjs')

url = 'https://steamcommunity.com/market/listings/730/%E2%98%85%20Karambit%20%7C%20Blue%20Steel%20(Battle-Scarred)'   

browser.visit(url)
print browser.is_element_present_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]', wait_time = 5)
price_table = browser.find_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]/table/tbody/tr')

print price_table
print price_table.first
print price_table.first.text
print price_table.first.value
browser.quit()

The first method is_element_present_by_xpath() ensures that the table I'm interested in is loaded. Then I try to access the rows of that table.

As I understood from Splinter documentation the .find_by_xpath() method returns ElementList, which is essentially a normal list with some aliases provided.

Price_table is an ElementList of all rows of table. The last two prints give out empty results, and I can't find any reason why text-method returns an empty string.

How could the elements of that table be accessed?

Marcs
  • 3,768
  • 5
  • 33
  • 42
Ivan
  • 19
  • 1
  • 6

2 Answers2

0

Have you tried doing for i in price_table yet? From the code it states that ElementList element extends python list. I am sure you can just iterate over price_table.

Edit: Also this is the first time I have heard of splinter, it looks like it is just an abstraction over the selenium python package. Maybe if you are stuck you can look at selenium docs. They are better written.

from splinter import Browser
import time
import sys

browser = Browser('phantomjs')

url = 'https://steamcommunity.com/market/listings/730/%E2%98%85%20Karambit%20%7C%20Blue%20Steel%20(Battle-Scarred)'   

browser.visit(url)
print browser.is_element_present_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]', wait_time = 5)
price_table = browser.find_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]/table/tbody/tr')

for i in price_table:
    print i
    print i.text

browser.quit()
0

I tried code with different browsers and always got empty text but I found expected data in html. Maybe it is only mistake in splinter.

from splinter import Browser

#browser = Browser('firefox')
#browser = Browser('phantomjs')

#browser = Browser('chrome') # executable_path='/usr/bin/chromium-browser' ??? error !!!
browser = Browser('chrome') # executable_path='/usr/bin/chromedriver' OK

url = 'https://steamcommunity.com/market/listings/730/%E2%98%85%20Karambit%20%7C%20Blue%20Steel%20(Battle-Scarred)'   

browser.visit(url)

print(browser.is_element_present_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]', wait_time = 5))

price_table = browser.find_by_xpath('//*[@id="market_commodity_buyreqeusts_table"]/table/tbody/tr')

for row in price_table:
    print('row html:', row.html)
    print('row text:', row.text) # empty ???
    for col in row.find_by_tag('td'):
        print('  col html:', col.html)
        print('  col text:', col.text) # empty ???

browser.quit()
furas
  • 134,197
  • 12
  • 106
  • 148
  • thank you! I'll use .html method to extract data. What do you think, using `selenium` instead of `splinter` wouldn't be easier? – Ivan Nov 04 '16 at 15:19
  • I never used `splinter` and I use `selenium` only occasionally but they seems similar. Selenium may have more informations/tutorial on internet. – furas Nov 04 '16 at 16:29
  • I've faced a stability issue of combination `splinter` + `phantomjs`. It works great when it scrapes 1-10 links, but when I gave it 30 links it constantly fails to find a table on a page somewhere around 24th-25th link. Do you have any idea how this can be managed? Or should I switch to `selenium`? – Ivan Nov 11 '16 at 23:32
  • create new question with all informations - error message, urls, etc. But first check in SO and Google maybe someone had the same problem before. And see first HTML which you get on this 24th-25th page - maybe there is no table, or server sends some message about bot and captcha. Or you have to wait for data generated by JavaScript. etc. – furas Nov 12 '16 at 01:16