0

I want to parse my website, search for the <iframe>-Tag and get the URL (attr src="").

I tried it like this:

url=`wget -O - http://my-url.com/site 2>&1 | grep iframe`
echo $url

With this, i get the whole HTML line:

<iframe src="//player.vimeo.com/video/AAAAAAAA?title=0&amp;byline=0&amp;portrait=0" width="480" height="360" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>     </div>

Well, how can i parse now the URL? I tried it with a few sed-syntaxes, but didn't make it :( Here's what I tried:

wget -O - http://myurl.com/ 2>&1 | grep iframe | sed "s/<iframe src/\\n<iframe src/g"

Kind regards, Matt ;)

Barmar
  • 741,623
  • 53
  • 500
  • 612
Matt Backslash
  • 764
  • 1
  • 8
  • 20
  • 1
    Please show what you tried, so we can help you understand how you went wrong. You don't learn anything by just copying an answer. – Barmar Dec 11 '14 at 13:42
  • Well I tried it with this: `wget -O - http://myurl.com/ 2>&1 | grep iframe` and then tried to cut the html out except for the url: `sed "s/ – Matt Backslash Dec 11 '14 at 13:52

2 Answers2

2
sed -n '/<iframe/s/^.*<iframe src="\([^"]*\)".*/\1/p'

You don't need grep, sed pattern matching can do that. Then you use a capture group with \(...\) to pick out the URL inside the quotes in the src attribute.

Barmar
  • 741,623
  • 53
  • 500
  • 612
0

You don't need sed, cut is sufficient:

~$ url='<iframe src="//player.vimeo.com/video/AAAAAAAA?title=0&amp;byline=0&amp;portrait=0" width="480" height="360" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>     </div>'
~$ echo $url|cut -d'"' -f 2
//player.vimeo.com/video/AAAAAAAA?title=0&amp;byline=0&amp;portrait=0
fredtantini
  • 15,966
  • 8
  • 49
  • 55