0

I have method that get title from url.

It works but on the one website I don't have result from match.

Have you any idea where is problem?

On the webpage is title in Test - sds

NSURL *url_s = [NSURL URLWithString:url];
            NSData* data = [NSData dataWithContentsOfURL:url_s];

            if(data!=nil){
                NSString* newStr = [NSString stringWithUTF8String:[data bytes]];
                NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"<title>(.*)</title>" options:0 error:NULL];

                NSTextCheckingResult *match = [regex firstMatchInString:newStr options:0 range:NSMakeRange(0, [newStr length])];

                NSString *title = [newStr substringWithRange:[match rangeAtIndex:1]];
}
Unmerciful
  • 1,325
  • 4
  • 21
  • 38
  • I don't know what the problem is, but i have seen some HTML where people will capitalize the letters... so it's possible someone put ..... this would not return a result in your regular expression – A'sa Dickens Oct 18 '13 at 12:59
  • Instead of passing `0` to the `options` parameter in `regularExpressionWithPattern:options:error:`, use `NSRegularExpressionCaseInsensitive`. Also, `newStr` can be assigned using `[NSString stringWithContentsOfURL:encoding:error:]`. There's no need to read the HTML into NSData and then convert to NSString. – neilco Oct 18 '13 at 13:05
  • Hi, I found where is problem, but I need good solution. In this title are newlines... – Unmerciful Oct 18 '13 at 13:17

2 Answers2

2

You should use the NSRegularExpressionCaseInsensitive and NSRegularExpressionDotMatchesLineSeparators options when matching HTML against a pattern.

NSRegularExpressionOptions opts = NSRegularExpressionCaseInsensitive | NSRegularExpressionDotMatchesLineSeparators;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"<title>(.*)</title>"
                                                                       options:opts
                                                                         error:NULL];
neilco
  • 7,964
  • 2
  • 36
  • 41
1

You cannot safely derive content from HTML or XML with Regular Expressions. XML and HTML are stateful, so they must actually be parsed as such. For example, using Regular Expressions would return the wrong result from:

<html>
<head>
    <!--<title>Old Title</title>-->
    <title>New Title</title>
</head>
</html>

You should choose and HTML parser and use it. I've successfully used Hpple before in apps.

Holly
  • 5,270
  • 1
  • 24
  • 27
  • 1
    +1 I might suggest Norbert see [How to Parse HTML on iOS](http://www.raywenderlich.com/14172/how-to-parse-html-on-ios) on Ray Wenderlich's site for a nice introduction. – Rob Oct 18 '13 at 16:40