0

I have code from website, where I would like to find and get path to image, I mean value from src in img tag. Core of the issue are paths to images between tags: wphimage and with .jpg extensions.

Below the code:

<p>
    <wphimage data="{'Copyright':'John Smith','Alignment':'left','ImageVersion':'conductorportraitlong'}">
    <span style="display:block; float:left;" class="DIV_imageWrapper">
        <a data-lightview-title="John Smith"  class="lightview" href="//path/to/image/web.jpg"">
            <img src="//path/to/image/web.jpg" alt="Name">
        </a>
        <a class="A_copyright" href="javascript:;">©&nbsp; <span>Terry Linke</span></a>
        <a href="javascript:;">≡ <span>John Smith</span></a>| 
        <a class="A_zoom lightview" href="//path/to/image/web.jpg" data-lightview-title="Dietfried Gürtler" data-lightview-caption="Terry Linke">+ </a>
    </span>
    </wphimage>

    Text here...
</p>

I tried with:

wphimage = re.findall(r'\S+\.jpg', text)

but I got also another values, from different tags then <img>.

Webdev
  • 159
  • 1
  • 8
  • 2
    Use beautifulsoup as mentioned there: https://stackoverflow.com/questions/43982002/extract-src-attribute-from-img-tag-using-beautifulsoup – Maurice Meyer May 28 '20 at 12:52

1 Answers1

0

You can try

wphimage = re.findall(r'<img.*src=\"(\S*|\w*)\"', txt)

output

['//path/to/image/web.jpg']

This regex is getting any non-whitespace and any word characters that between " " in an img tag src value.

Leo Arad
  • 4,452
  • 2
  • 6
  • 17