-4

I am trying to scan a text, but I do not understand how it works, could anybody help me??

<a class="lightbox"  title ="elecciones mexico 2012" href="http://www.myWebpage.com/wp-content/uploads/2012/07/elecciones-mexico-2012.jpg"><img src="http://www.myWebpage.com/wp-content/uploads/2012/07/elecciones-mexico-2012.jpg" alt="" title="elecciones mexico 2012" width="643" height="391" class="aligncenter size-full wp-image-66795" /></a></p>
<p>I need this text</p>
<p> And this text.</p>
<p> Also this text! </p>

<p> I dont want this text </p>]]>

So that my final string would be something like: I need this text And this text Also this text!

Thanks in advance

  • 1
    Did you search at all on this site or the web? Good first step. – spring Jul 04 '12 at 18:24
  • Yes I did, but I cant understand the use of NSScanner... – user1179587 Jul 04 '12 at 18:26
  • 1
    Do you understand Objective-C at all, or are you new to it? Obj-C can be weird. :) Have you looked at this? https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/Scanners.html – Almo Jul 04 '12 at 18:28
  • Look in the sidebar ---------------------> there are a dozen Q+As about NSScanner. Surely they are relevant to what you need to do. – spring Jul 04 '12 at 18:28
  • What are the distinguishing characteristics of the text you want to keep? – Nikolai Ruhe Jul 04 '12 at 18:28

2 Answers2

0

Wasting your shot. That's why NSXMLParser is there.

@interface TextParser: NSObject {
    NSMutableString *text;
}

- (id)init
{
    if ((self = [super init]))
    {
        text = [[NSMutableString alloc] init];
        NSXMLParser *parser = [[NSXMLParser alloc] initWithData:[string dataUsingEncoding:NSUTF8Stringaencoding];
        parser.delegate = self;
        [parser parse];

        // here text will contain all the text contained by the XML tags
    }
    return self;
}

- (void)parser:(NSXMLParser *)p foundCharacters:(NSString *)chars
{
    [text appendString:chars];
}
0

Well, the NSScanner that I know (MacOS) isn't particulary suitable for the kind of parsing you are looking for. It just goes through a string and returns "tokens" like numbers or strings defined by the set of characters they are made of. This is not particulary useful to process the tags in your string example, unless you are willing to accept a high chance of errors.

In that case, you could probably do something like "read a string composed of anything but <" and attach that to the result string, then "read a string composed of anything but >" and discard that and so on, until you have reached the end. Depending on what you are actually trying to parse this may or may not work; it's definitely not "the way" to get the plain text from HTML.

It's not XML either (the tags don't match), so using NSXML probably isn't an option either...

Christian Stieber
  • 9,954
  • 24
  • 23