2

I am writing a Java program which connects to this site, and then simulates passing a URL into the textfield, hitting the convert button, and then obtaining the generated download URL. I inspected the sites network traffic and the download's URL has the following data under the headers tab:

After pressing the download button I see...

**General:**
Remote Address:64.233.171.121:80
Request URL:http://www.youtube-mp3.org/a/itemInfo/?video_id=KMU0tzLwhbE&ac=www&t=grp&r=1439422557443&s=63079
Request Method:GET
Status Code:200 OK
**Response Headers:**
view source
Cache-Control:no-cache
Content-Encoding:gzip
Content-Length:249
Content-Type:text/html; charset=utf-8
Date:Wed, 12 Aug 2015 23:35:57 GMT
Server:Google Frontend
Vary:Accept-Encoding
**Request Headers:**
view source
Accept:*/*
Accept-Encoding:gzip, deflate, sdch
Accept-Language:en-US,en;q=0.8
Accept-Location:*
Cache-Control:no-cache
Connection:keep-alive
Cookie:_ga=GA1.2.1715601918.1425946204; ux=cce7b6d7-c6b9-11e4-8ef7- 5557045ab030|0|0|1439422547|1439854547|3536a31fad07ba73ecc1e4ba4b3cf3d6; __  utmt=1; __utma=120311424.1715601918.1425946204.1439420885.1439421591.16; __utmb=120311424.3.10.1439421591; __utmc=120311424; __utmz=120311424.1439421591.16.16.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
Host:www.youtube-mp3.org
Referer:http://www.youtube-mp3.org/
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4)    AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.130 Safari/537.36
**Query String Parameters:**
view source
view URL encoded
video_id:KMU0tzLwhbE
ac:www
t:grp
r:1439422557443
s:63079

And under the "Response" tab in inspect element (under Network) it shows:

info = {"status": "serving", "h2": "c0c0d91ef9ca62df13a43f71c6ad8ea3", "image": "http://i.ytimg.com/vi/KMU0tzLwhbE/default.jpg", "progress_speed": "", "ads": "", "title": "Developers", "h": "fa5ef79cbdee0bb33da3348818b26715", "px": "", "ts_create": "1439422557", "length": "3", "r": "NzMuMjE1LjIxMC4w", "pf": "", "progress": ""};

Now, I am trying to obtain this data which I will then parse into a download URL after taking the necessary parameters. However, I am not sure how to access this information. I created a Jsoup GET request like this:

Connection.Response res = null;
    Document doc = null;
    String id;
    String vidID;
    String downloadURL;
    Scanner input = new Scanner(System.in);

    System.out.print("URL: ");
    id = input.nextLine();
    vidID = id.substring(id.length()-11,id.length());

    try{

        res = Jsoup.connect("http://www.youtube-mp3.org/")
                .referrer("http://www.youtube-mp3.org/")
                .header("Accept","*/*")
                .data("video_id",vidID)
                .method(Connection.Method.GET)
                .execute();
      //not sure how to proceed
}

But that is not working out for me. I am able to get the necessary video_id paramter from the user's input so that is fine. However, I am unable to find the rest of the necessary parameters listed for the "Request URL:" param. How can I construct a request to connect to the site and pass my inputted URL? And then after doing so, how would I obtain the information listed under the Network/Reponse tab in inspect element?

Thanks for any advice

EDIT:

So the request URL looks something like this: (Note: This is the request leading up to the page with the download URL info under the Response tab, so I first need to do a request to the following URL)

http://www.youtube-mp3.org/a/itemInfo/?video_id=KMU0tzLwhbE&ac=www&t=grp&r=1439422557443&s=63079

I know the video_id, I'm assuming the "ac" value is always www and that the "t" value is grp, however, the "r" and "s" parameters seems to be generated differently for each video, and I am not yet able to see how...

UPDATE: Alright so after further probing, it looks like I need to do two things to accomplish what I need...

  1. Make a GET Request to a URL in the form of the following:

    http://www.youtube-mp3.org/a/itemInfo/?video_id=KMU0tzLwhbE&ac=www&t=grp&r=1439425642030&s=5534

while passing in values for "item" (no problem), "el" (no problem), "bf" (no problem), "r" (need to get), and "s" (need to get).

Then after doing this...

  1. Make another GET Request to the pre-download URL which is in the following form:

    http://www.youtube-mp3.org/a/itemInfo/?video_id=KMU0tzLwhbE&ac=www&t=grp&r=1439425642030&s=5534

while passing in values for while passing in values for "video_id" (no problem), "ac" (no problem), "t" (no problem), "r" (need to get), and "s" (need to get).

and then

  1. Obtain Response data and construct download URL
mlz7
  • 2,067
  • 3
  • 27
  • 51
  • 1
    The `r` parameter appears to be the current time since epoch in milliseconds. – FThompson Aug 13 '15 at 00:48
  • @Vulcan hmm didn't think of that. Is there a way I could obtain this value and then pass it in the URL? – mlz7 Aug 13 '15 at 00:50
  • `System.currentTimeMillis()` returns time since epoch in milliseconds. – FThompson Aug 13 '15 at 01:00
  • Regarding your question as a whole, however, it may be worthwhile to approach this differently and simulate clicking the "Convert video" button using [HtmlUnit](http://stackoverflow.com/q/3897871/1247781) and then scraping the resulting HTML for the download link. That way, you won't need to worry about any back-end information required by the webpage, as you could simply simulate how a human user would use that web interface. – FThompson Aug 13 '15 at 01:00
  • @Vulcan The problem is, I am planning on eventually adding this to an android app which does not support HtmlUnit unfortunately. I've been looking for alternative dynamic web parsers but I can't find for android – mlz7 Aug 13 '15 at 01:04
  • @Vulcan do you know of any libraries which could work for this purpose? – mlz7 Aug 13 '15 at 01:06

0 Answers0