The API is JSON-based, so the contents of the html files won't give you any clue on where to find the files. A good idea when exploring web services like this one, is to open the Network tab in Chrome's developer tools and see what pages it loads when interacting with the page. That exercise showed me that two urls in particular seem interesting:
- http://www.youtube-mp3.org/api/pushItem/?item=http%3A//www.youtube.com/watch%3Fv%3DKMU0tzLwhbE&xy=trve&r=1314700829128
- http://www.youtube-mp3.org/api/itemInfo/?video_id=KMU0tzLwhbE&adloc=&r=1314700829314
The first url appears to be queuing a file for processing, the second to get the status of the processing job.
The second url takes a video_id GET parameter that is the id for the video on youtube (http://www.youtube.com/watch?v=KMU0tzLwhbE) and returns the status of the decoding job. The second and third seem irrelevant for this purpose which you can verify by test loading the url with and without the extra parameters.
The content of the page is:
info = { "title" : "Developers",
"image" : "http://i4.ytimg.com/vi/KMU0tzLwhbE/default.jpg",
"length" : "3", "status" : "serving", "progress_speed" : "",
"progress" : "", "ads" : "",
"h" : "a0aa17294103c638fa7f5e0606f839d3" };
Which happens to be JSON data. The interesting bit in this is "a0aa17294103c638fa7f5e0606f839d3" which looks like a hash that the web service use to refer to the decoded mp3 file. Also check out how the download link on the front page looks:
http://www.youtube-mp3.org/get?video_id=KMU0tzLwhbE&h=a0aa17294103c638fa7f5e0606f839d3
Now we have all the missing pieces of the puzzle together. First, we take the url of a youtube video (http://www.youtube.com/watch?v=iKP7DZmqdbU) url quote it and feed it to the api using this url:
http://www.youtube-mp3.org/api/pushItem/?item=http%3A//www.youtube.com/watch%3Fv%3DiKP7DZmqdbU&xy=trve
Then, wait a few moments until the decoding job is done:
http://www.youtube-mp3.org/api/itemInfo/?video_id=iKP7DZmqdbU
Take the hash found in the info url to construct the download url:
http://www.youtube-mp3.org/get?video_id=iKP7DZmqdbU&h=2e4b61b6ddc8bf83f5a0e4e4ee0635bb
Note that it is possible that the web master of the site does not want to be scraped and will take counter measures if people starts to (in the webmasters eyes) abuse the site. For example it seem to use referer protection so clicking the links in this post won't work, you have to copy them and load them in a new browser window.
Test code:
from re import findall
from time import sleep
from urllib import urlopen, quote
yt_code = 'gijypDkEqUA'
yt_url = 'http://www.youtube.com/watch?v=%s' % yt_code
push_url_fmt = 'http://www.youtube-mp3.org/api/pushItem/?item=%s&xy=trve'
info_url_fmt = 'http://www.youtube-mp3.org/api/itemInfo/?video_id=%s'
download_url_fmt = 'http://www.youtube-mp3.org/get?video_id=%s&h=%s'
push_url = push_url_fmt % quote(yt_url)
data = urlopen(push_url).read()
sleep(10)
info_url = info_url_fmt % yt_code
data = urlopen(info_url).read()
res = findall('"h" : "([^"]*)"', data)
download_url = download_url_fmt % (yt_code, res[0])
print 'Download here:', download_url