[In order to open the example urls you need to login to Shazam]
So I'm writing a script that downloads my Shazam history so I can then manipulate it to write playlists to other services. Anyways, I can't directly parse the history from http://www.shazam.com/myshazam because there's a lot of JavaScript reloading going on there and I guess it would be harder to solve that problem. So that's why I want to manipulate the file you can download which you can find here http://www.shazam.com/myshazam/download-history
I'm trying to find a way to do this but I'm running with some problems here.
1st I was planning to use urlretrieve
import urllib
urllib.urlretrieve ("http://www.shazam.com/myshazam/download-history, "myshazam-history.html")
but I'm not even sure that's going to work at all because when I try to download that file there's not an actual URL path like http://www.shazam.com/myshazam/download-history/myshazam-history.html (that gives you a 404 error). Instead when you hit that URL it immediately redirects to http://www.shazam.com and it prompts the download window of the browser.
The 2nd problem is that I still need to hold the cookies of the sessions and I don't know how to pass that to urlretrieve to test if it works. Below there is a test code I wrote that is logging in, holding the session and then parse a webpage.
def LoginFB(username,password):
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
url = "https://www.facebook.com/login.php?skip_api_lo....allthe_loginshazam_stuff)"
data = "&email="+username+"&pass="+password
socket = opener.open(url)
return socket, opener
def shazamParse(opener):
url = "http://www.shazam.com/myshazam/"
content = opener.open(url).read()
soup = BeautifulSoup(content)
finalParse = soup.prettify()
return finalParse.encode("utf-8")
(socket, opener) = LoginFB("email","password")
shazamParse(opener)
What I want to do is hit the download url as a logged user(holding the session cookies), download the file into the memory, put the contents of the file into a string and then parse it with BeautifulSoup. Exactly the same approach as my shazamParse function only that I'm reading from a string with the contents of the myshazam-history.html file.
Any ideas or hints on how can I do this?