2

I'm trying to get the number of plays for the top songs from a number of artists on Spotify using python and splinter.

If you fill in the username and password below with yours, you should be able to run the code.

from splinter import Browser
import time
from bs4 import BeautifulSoup

browser = Browser()
url = 'http://play.spotify.com'
browser.visit(url)
time.sleep(2)
button = browser.find_by_id('has-account')
button.click()
time.sleep(1)
browser.fill('username', 'your_username')
browser.fill('password', 'your_password')
buttons = browser.find_by_css('button')
visible_buttons = [button for button in buttons if button.visible]
login_button = visible_buttons[-1]
login_button.click()
time.sleep(1)
browser.visit('https://play.spotify.com/artist/5YGY8feqx7naU7z4HrwZM6')
time.sleep(10)

So far, so good. If you open up firefox, you'll can see Miley Cyrus's artist page, including the number of plays for top tracks.

If you open up the Firefox Developer Tools Inspector and hover, you can see the name of the song in .tl-highlight elements, and the number of plays in .tl-listen-count elements. However, I've found it impossible (at least on my machine) to access these elements using splinter. Moreover, when I try to get the source for the entire page, the elements that I can see by hovering my mouse over them in Firefox don't show up in what is ostensibly the page source.

html = browser.html
soup = BeautifulSoup(html)
output = soup.prettify()
with open('miley_cyrus_artist_page.html', 'w') as output_f:
    output_f.write(output)
browser.quit()

I don't think I know enough about web programming to know what the issue is here--Firefox sees all the DOM elements clearly, but splinter that is driving Firefox does not.

Michael K
  • 2,196
  • 6
  • 34
  • 52
  • Why don't use spotify API instead? Thanks. – alecxe Jul 15 '15 at 19:03
  • 1
    These numbers are not currently available through the spotify API: http://stackoverflow.com/questions/31430851/get-play-count-from-top-songs-on-spotify – Michael K Jul 15 '15 at 19:04

2 Answers2

1

The key problem is that there is an iframe containing the artist's page with list of tracks. You need to switch into it's context before searching for elements:

frame = browser.driver.find_element_by_css_selector("iframe[id^=browse-app-spotify]")
browser.driver.switch_to.frame(frame)
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • That looks promising--when I execute that code, I get no frame, but if i change "frame" to "iframe" and run the second line, I hit a TypeError that looks like: ` is not JSON serializable` – Michael K Jul 15 '15 at 19:43
  • @MichaelK could you try once again (I've updated the code but using selenium bindings directly through `browser.driver`)? Thanks. – alecxe Jul 15 '15 at 19:44
  • OK, that executes fine, but I still can't find the elements that I'm looking for, and running `browser.html` still yields the html from the outer container. – Michael K Jul 15 '15 at 19:55
  • @MichaelK how do you locate these elements? – alecxe Jul 15 '15 at 20:04
  • Okay, still not sure why the solution is the way it is, but I have something that works. Thanks a ton. – Michael K Jul 15 '15 at 20:10
  • @MichaelK aside from switching to the iframe, you may need to use explicit waits to wait for elements to load. Spotify web-site is quite dynamic. – alecxe Jul 15 '15 at 20:11
0

Many thanks to @alecxe, the following code works to pull the information on the artist.

from splinter import Browser
import time
from bs4 import BeautifulSoup
import codecs

browser = Browser()
url = 'http://play.spotify.com'
browser.visit(url)
time.sleep(2)
button = browser.find_by_id('has-account')
button.click()
time.sleep(1)
browser.fill('username', 'your_username')
browser.fill('password', 'your_password')
buttons = browser.find_by_css('button')
visible_buttons = [button for button in buttons if button.visible]
login_button = visible_buttons[-1]
login_button.click()
time.sleep(1)
browser.visit('https://play.spotify.com/artist/5YGY8feqx7naU7z4HrwZM6')
time.sleep(30)

CORRECT_FRAME_INDEX = 6
with browser.get_iframe(CORRECT_FRAME_INDEX) as iframe:
    html = iframe.html
    soup = BeautifulSoup(html)
    output = soup.prettify()
    with codecs.open('test.html', 'w', 'utf-8') as output_f:
        output_f.write(output)
browser.quit()
Michael K
  • 2,196
  • 6
  • 34
  • 52