2

I want to get only the link from this string:

"<p><a href=\"https://www.youtube.com/watch?v=i2yscjyIBsk\">https://www.youtube.com/watch?v=i2yscjyIBsk</a></p>\n"

I want output as https://www.youtube.com/watch?v=i2yscjyIBsk

So, how I can I achieve it?

I have tried:

func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex)
let nsString = text as NSString
    let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
    return results.map { nsString.substring(with: $0.range)}
} catch let error {    
}

And tried this regex: "<a[^>]+href=\"(.*?)\"[^>]*>.*?</a>"

But still I can't figure it out.

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
user3127109
  • 3,421
  • 8
  • 24
  • 33
  • Possible duplicate of [What is the best practice to parse html in swift?](http://stackoverflow.com/questions/31080818/what-is-the-best-practice-to-parse-html-in-swift) – Pascal Nov 25 '16 at 15:02

1 Answers1

9

By using NSDataDetector class you can extract links exactly:

let text = "<p><a href=\"https://www.youtube.com/watch?v=i2yscjyIBsk\">https://www.youtube.com/watch?v=i2yscjyIBsk</a></p>\n"
let types: NSTextCheckingType = .Link
let detector = try? NSDataDetector(types: types.rawValue)

guard let detect = detector else {
    return
}

let matches = detect.matchesInString(text, options: .ReportCompletion, range: NSMakeRange(0, text.characters.count))

for match in matches {
    print(match.URL!)
}

Description: NSDataDetector class can match dates, addresses, links, phone numbers and transit information. Reference.

The results of matching content is returned as NSTextCheckingResult objects. However, the NSTextCheckingResult objects returned by NSDataDetector are different from those returned by the base class NSRegularExpression.

Results returned by NSDataDetector will be of one of the data detectors types, depending on the type of result being returned, and they will have corresponding properties. For example, results of type date have a date, timeZone, and duration; results of type link have a url, and so forth.


There is another way to get link and other specific string between <a> ... </a> tag:

let string = "<p><a href=\"https://www.youtube.com/watch?v=i2yscjyIBsk\">https://www.youtube.com/watch?v=i2yscjyIBsk</a></p>\n"
let str = string.stringByReplacingOccurrencesOfString("<[^>]+>", withString: "", options: .RegularExpressionSearch, range: nil)
print("string: \(str)")

Output:

string: https://www.youtube.com/watch?v=i2yscjyIBsk

Note: I suggest you to use above solution to get the links specifically thanks.

vaibhav
  • 4,038
  • 1
  • 21
  • 51
  • Just note that this does not extract the link target (the href in the anchor) but the *text* between `` and ``. That text need not be a link and need not be equal to the href. – Martin R Nov 25 '16 at 08:43
  • 1
    @MartinR i respect your correct explanation here, i found this working solution in my case also that's why i suggested. Can you please elaborate more this `That text need not be a link and need not be equal to the href` so i do update me and suggestion both :) – vaibhav Nov 25 '16 at 08:55
  • 2
    What I meant is that for `let string = "

    What??

    \n"` your code will extract `What??` and not `https://www.google.com`.
    – Martin R Nov 25 '16 at 09:06
  • okay now understood what you meant ` ... `, i will search for the better solution and it will be nice that you could provide some links thanks very much :) – vaibhav Nov 25 '16 at 09:28