0

I need to get the video link from a web page. I click on inspect element and go to Network tab, and I see a link I need to get... But how can I access this link trough python?

this is the situation: https://i.stack.imgur.com/qH26K.jpg

the link is positioned in the header:

https://i.stack.imgur.com/2XtUM.jpg

I need only link, I don't need to download the video.

What would be the best path to go? Maybe Selenium?

Natko Kraševac
  • 111
  • 1
  • 2
  • 8

2 Answers2

1

Selenium will work, yes. What you'll want to do is find the element in the DOM that's pulling it in. Before you go that route though, you should try to figure out what element you're after manually. You're probably after a video tag and its child source tag.

HTML 5 video tag docs: http://www.w3schools.com/tags/tag_video.asp

Selenium selector docs: https://selenium-python.readthedocs.org/locating-elements.html

wholevinski
  • 3,658
  • 17
  • 23
  • Thank you sir, but it is not a HTML5 video, it is an mp4 file implemented with flash. This is the site http://www.rtl.hr/rtl-sada/gastro/tri-dva-jedan-kuhaj/tri-dva-jedan-kuhaj-sezona2/65168/tri-dva-jedan-kuhaj-23/ – Natko Kraševac Mar 26 '15 at 16:37
  • Those are importing the various js and css files the website uses. You'll want to look for the swf object. Beyond that, I'm not sure how to get the URL that the swf is pulling. – wholevinski Mar 26 '15 at 17:00
  • Also i didn't mention that when I click on the video/mp4 item (from screenshot in asked question) the .mp4 link is located in response header. Maybe selenium isn't needed? I just don't know how to ectract this exact header from html response – Natko Kraševac Mar 26 '15 at 17:57
  • I'm not sure selenium is the right path to go down at this point. Maybe elaborate a little bit more in the original question on what exactly you're trying to do with the video; there might be other options depending on the end goal. – wholevinski Mar 26 '15 at 18:17
0

You just need to do a HTTP request to get the page and then go through the response to get the url. You need to define the XPath and use lxml to get the URL. Something like (it is just an example, probably will not work straight forward):

import lxml.html as parser
import requests

path = <define the XPATH>
url = <your url>

data = do_request(url)
if data:
    doc = parser.fromstring(data) 
    url_res = doc.xpath(path) #the url from the webpage

#do_requests() example
def do_request(url):
    r = requests.get(url)
    return r.text if r.status_code == 200 else None
Hugo Sousa
  • 906
  • 2
  • 9
  • 27
  • There is no direct link in HTML response, I think the page goes through some script request to get the video link and then puts it in a flash player – Natko Kraševac Mar 26 '15 at 16:43
  • What you are trying to do is not easy because the video is "encapsulated" by the player. There is no "direct link" to get the video. Please, check this post: http://stackoverflow.com/questions/8660526/extract-video-from-swf-using-python – Hugo Sousa Mar 26 '15 at 16:46
  • Actually there is a direct link, which is accessible through already mentioned inspect element. The link is (server)..../repository/media/b/f/bf27b3354c83c37611e73f97495b5e1d.mp4?ver=1 I know it is not easy, just wondering if possible. – Natko Kraševac Mar 26 '15 at 16:50
  • So if there is a direct link and you can access it through inspect element you can get it from the http response. I just open the video here http://www.rtl.hr/repository/media/b/f/bf27b3354c83c37611e73f97495b5e1d.mp4?ver=1 so you can download this file through this link. You just need to iterate over the content like `for chunk in r.iter_content(chunk_size=255)` write to file in disk, being `r = requests.get(video_url)` – Hugo Sousa Mar 26 '15 at 16:54
  • I think I wasn't clear what I want to do. But thanks anyway. :) My goal is to scrape just the video link, because I need it for my application. It will get the link for desired episode and play it in a local player such as VLC. So I need to write a script that accepts episode link, and returns me only the .mp4 link – Natko Kraševac Mar 26 '15 at 17:01
  • Also i didn't mention that when I click on the video/mp4 item (from screenshot in asked question) the .mp4 link is located in response header. Maybe selenium isn't needed? I just don't know how to ectract this exact header from html response – Natko Kraševac Mar 26 '15 at 17:57