0

I have this set of html

<div class="_57-o acbk" data-store="{&quot;object_id&quot;:254863256822548}" id="u_0_3_XB" data-sigil="photo-stage marea">

 <img src="https://scontent.fdel10-1.fna.fbcdn.net/v/t39.30808-6/274092727_254863253489215_7220866489517235931_n.jpg?stp=cp0_dst-jpg_e15_fr_q65&amp;_nc_cat=107&amp;ccb=1-5&amp;_nc_sid=dd9801&amp;_nc_ohc=EAKHwJL9zzcAX9HXKnn&amp;_nc_ht=scontent.fdel10-1.fna&amp;oh=00_AT-T0elfrrjiMuDTzi2DO2nIS7zzjAjQkeROOj04Lv_v1A&amp;oe=6220A6D4" width="414" height="232" class="img" data-store="{&quot;imgsrc&quot;:&quot;https:\/\/scontent.fdel10-1.fna.fbcdn.net\/v\/t39.30808-6\/274092727_254863253489215_7220866489517235931_n.jpg?stp=cp0_dst-jpg_e15_fr_q65&amp;_nc_cat=107&amp;ccb=1-5&amp;_nc_sid=dd9801&amp;_nc_ohc=EAKHwJL9zzcAX9HXKnn&amp;_nc_ht=scontent.fdel10-1.fna&amp;oh=00_AT-T0elfrrjiMuDTzi2DO2nIS7zzjAjQkeROOj04Lv_v1A&amp;oe=6220A6D4&quot;}" alt="May be an image of 4 people and text" data-sigil="photo-image" data-store-id="0">

how to get the url inside the imgsrc tag

Poornima Mishra
  • 406
  • 2
  • 18
  • I'm sorry but it is not too clear what you want ? From the above html tag what is it that you want exactly ? The url inside the img src ? – Shawn Frank Feb 27 '22 at 16:34

1 Answers1

1

The most robust solution is to use a HTML parser (e.g. Hpple or NDHpple or others).

You can get pretty close just using regular expressions (a.k.a. “regex”).

For example, a simple implementation with look ahead/behind expressions might be:

if let range = string.range(of: #"(?<=<img src\s?=\s?")[^"]*(?=".*>)"#, options: [.regularExpression, .caseInsensitive]) {
    print(string[range])
}

A slightly more robust implementation, using capture groups, would be:

let pattern = #"<img\b.*\bsrc\s*=\s*"([^"]+)".*>"#
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)

let nsRange = NSRange(string.startIndex..., in: string)
regex.enumerateMatches(in: string, range: nsRange) { result, _, _ in
    if let range = (result?.range(at: 1)).flatMap({ Range($0, in: string) }) {
        print(string[range])
    }
}

The basic idea in both is to find the substring within the quotation marks between the <img src=" and the closing " and subsequent >. See the NSRegularExpression documentation for details about regex patterns.

But, strictly speaking, neither regex approach is quite correct, as they make many assumptions about the nature of the HTML. Regex simply is not a replacement for a proper HTML parser. See RegEx match open tags except XHTML self-contained tags.

If this is just a quick-and-dirty exercise, the regex will get you pretty close, but if you really want robust parsing of image URLs, you really should use one of the aforementioned HTML parsers.

Rob
  • 415,655
  • 72
  • 787
  • 1,044