1

I'm fetching the value of an XML entity in an libxml2 SAX parser similarly to how the ansewr to this question suggests. Specifically, my code looks like so (attributes[i].value is *xmlChar):

    int valueLength = (int) (attributes[i].end - attributes[i].value);
    value = [[[NSString alloc] initWithBytes:attributes[i].value
                                      length:valueLength
                                    encoding:NSUTF8StringEncoding
    ] autorelease];

However, for some reason, when the attribute value (a URL in this case) has the entity & in the original XML, the value I get has &#38.

Say what?

How do I get libxml2 to decode attribute entities (it seems to do it fine for text node entities), so that I just get &?

Community
  • 1
  • 1
theory
  • 9,178
  • 10
  • 59
  • 129

1 Answers1

3

libxml2 does not replace entities by default, you have to turn that on when you create the xmlReader.

This code has an example

http://xmlsoft.org/examples/reader2.c

The docs for XML_PARSE_NOENT are here;

http://xmlsoft.org/html/libxml-parser.html

Although it has been a while since I used the entity bits from libxml2 I recall having to do something to get the default entity resolver in place. Docs on that here;

http://xmlsoft.org/xmlio.html

If this does not wrap it up please ping me back and I'll look in the source for Foto Brisko, I had to handle it there...

Although the blog post is long winded I think the sample from here

http://bill.dudney.net/roller/objc/entry/libxml2_push_parsing

might have the entity stuff turned on as well but its been so long I've forgotten and I don't have time right now to go back through it.

Good luck!

Bill Dudney
  • 3,358
  • 1
  • 16
  • 13
  • Yes, your libxml2 blog post was the starting point for my code. It doesn't have anything related to entities in it AFAICT. And there is no `options` argument to `xmlCreatePushParserCtxt()` as there is to `xmlReaderForFile()`. But maybe I just need a function pointer in the right slot in the `simpleSAXHandlerStruct` struct? I'm looking into that now… – theory Feb 16 '11 at 16:57
  • 2
    Ah, found it. After the line that creates the context, `xmlContext = xmlCreatePushParserCtxt(&simpleSAXHandlerStruct, self, NULL, 0, NULL);`, I put another line to set options, `xmlCtxtUseOptions(xmlContext, XML_PARSE_NODICT | XML_PARSE_NOENT);`. That does the trick. Thanks! – theory Feb 16 '11 at 17:17
  • Now justusing `XML_PARSE_NOENT`. No clue what `XML_PARSE_NODICT` is for. It's one-line documentation doesn't mean much to me. `XML_PARSE_NOENT` is all I needed. Curious as to why entities are decoded in text nodes, but I need this setting to get them to decode in attribute values. That is, why should it be any different? Anyway, thanks again. – theory Feb 16 '11 at 17:38