0

I would like to be able to play the webcam video response in a video view but the json response from Weather Underground for the "camurl" is like so:

http://www.wunderground.com/webcams/cadot1/902/show.html

The url for the video I need to play is embedded in the html code with a url like so:

https://www.wunderground.com/webcams/cadot1/1216/video.html?month=11&year=2016&filename=current.mp4

Is there a way to get that url from the json response "camurl"? I've heard of this term "html scraping", is that possible to get the embedded video url from the json response html page?

This is what the full json response looks like for the webcam:

    {
    "handle": "mahouser",
    "camid": "mahouserCAM1",
    "camindex": "1",
    "assoc_station_id": "KCACAMAR18",
    "link": "http://",
    "linktext": "Michael Houser",
    "cameratype": "Foscam FI9900P",
    "organization": "",
    "neighborhood": "Camarillo Hills",
    "zip": "93010-12",
    "city": "CAMARILLO",
    "state": "CA",
    "country": "US",
    "tzname": "America/Los_Angeles",
    "lat": "34.24947357",
    "lon": "-119.03993988",
    "updated": "2016-11-10 20:57:24",
    "updated_epoch": "",
    "downloaded": "2016-11-08 20:38:48",
    "isrecent": "1",
    "CURRENTIMAGEURL": "http://icons.wunderground.com/webcamramdisk/m/a/mahouser/1/current.jpg?t=1478812080",
    "WIDGETCURRENTIMAGEURL": "http://icons.wunderground.com/webcamramdisk/m/a/mahouser/1/widget.jpg?t=1478812080",
    "CAMURL": "http://www.wunderground.com/webcams/mahouser/1/show.html"
}

I've looked at jsoup and read the documentation but can't figure out how to get the needed url. Here is how the url looks in the html:

    <td class="day">
    <div class="row">
    <div class="small-2 medium-5 columns">
    <a href="/history/airport/KAJO/2016/11/15/DailyHistory.html" class="day-num">
    15
    </a>
    </div>
    <div class="small-10 medium-7 columns">
    <img src="//icons.wxug.com/i/c/v4/clear.svg" alt="Clear" class="right" />
    </div>
    </div>
    <div class="calThumb">
    <a href="http://icons.wunderground.com/webcamramdisk/c/a/cadot1/902/current.jpg?1479239986" rel="lightbox[webcam]" title="">
    <img src="http://icons.wunderground.com/webcamramdisk/c/a/cadot1/902/current-thumb.jpg?1479239986" width="100" height="75" alt="" title="Click to view the time-lapse video for this day." />
    </a>
    </div>
    <p><a href="video.html?month=11&year=2016&filename=current.mp4" class="videoText">View Video</a></p>
    </td>

How can I get that "current.mp4" url from within the html code?

Steve C.
  • 1,333
  • 3
  • 19
  • 50

1 Answers1

1

There are a lot of possible ways, but here is a simple solution:

  1. Retrieve the html code with jsoup:

    Document doc = Jsoup.connect("http://www.wunderground.com/webcams/cadot1/902/show.html").get();
    
  2. Then, retrieve all elements with the class videoText:

    Elements elements = doc.getElementsByClass("videoText");
    

    This will give you a list of entries. Now simply select the one that ends with current.mp4.

  3. To retrieve the current.mp4 URL:

    for (Element link : elements) {
        String linkHref = link.attr("href");
        // linkHref contains something like video.html?month=11&year=2016&filename=current.mp4
        // TODO check if linkHref ends with current.mp4
    }
    
Manuel Allenspach
  • 12,467
  • 14
  • 54
  • 76
  • That works to get all of the "videoText" elements from within a single html file but what about when there are an array of html links that need to be scraped for specific file links? i.e., current.mp4. Would JSOUP be able to quickly retrieve the needed mp4 file links from each html link returned by the json array? – Steve C. Nov 16 '16 at 22:43
  • You answered my question above, so thank you. I'll mark this as the answer although I ran into a different problem as a result of this answer. I should've paid attention. The resulting video url from this method is just that, a url. I assumed I would get the link to the video so I could stream it within the app. – Steve C. Nov 16 '16 at 23:48