1

We want to import a huge XML-file (13MB) to Core Data. At the moment, the XML-File includes around 64000 entries, but this number will increase in future.

XML-Structure:

<entry name='...' doctype='' last-modified='...' [some more attributes]  />

After a lot of research which included the XMLSchema Sample Project, Ray Wenderlich XML Tutorial and some stackoverflow entries, we didn't found a solution yet.

We first download the XML-File, and afterwards start parsing and insert the data to CoreData Here is our implementation:

- (void)importXMLFile:(NSString*)fileName {

  NSInputStream* theStream = [[NSInputStream alloc] initWithFileAtPath:fileName];

  _theParser = [[NSXMLParser alloc] initWithStream:theStream];
  _theParser.delegate = self;

  dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
    [_theParser parse];
  });    
}


- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict {

if ([elementName isEqualToString:@"entry"]) {

    Importer* __weak weakSelf = self;

    NSManagedObjectContext* theContext = self.importContext;

    [theContext performBlock:^{

        CustomObject* mo;

        // Create ManagedObject
        // Read values from parsed XML element

        dispatch_async(dispatch_get_main_queue(), ^{

           // Call a handler, just for information "added object"

        });

        NSError *error = nil;

        if ([theContext hasChanges] && ![theContext save:&error]) {

            NSLog(@"Unresolved error %@, %@", error, [error userInfo]);
            abort();
        } else {
            DLOGError(error);
        }

    }];
  }

}

Using this methods, memory usage explodes leading to a crash. The XML file seems to be parsed completely before even one block is being processed by Core Data. So the question is:

Is it possible to process parts of the XML file (f.e. 30 entries a time), than save to CoreData and after that continue parsing?

Or more commonly asked: How can memory usage be optimized?

longi
  • 11,104
  • 10
  • 55
  • 89

3 Answers3

4

You want to use a stream based parser so you don't need to load the whole XML to memory at the same time. Perhaps this or something from github.

You should also batch your save operation. Don't save every individual object, save groups of perhaps 100 objects. If this is inside a tight loop you should have an autorelease pool.

Community
  • 1
  • 1
Wain
  • 118,658
  • 15
  • 128
  • 151
2

guess our memory problem occurred with a line we didn't publish, while creating our ManagedObject. We had to free the xmlChar

Instead of

xmlChar *xmlString = xmlTextReaderGetAttribute(reader, (xmlChar*)"someAttribute");
NSString *someAttributeToString = [NSString stringWithUTF8String:(const char *)xmlString];

we used

xmlChar * nameString = xmlTextReaderGetAttribute(reader, (xmlChar*)"someAttribute");
if (attributeString)
{
    [elementDict setValue:[NSString stringWithUTF8String:(const char*)attributeString] forKey:@"someAttribute"];
    xmlFree(nameString);
}

And we pause our parser after parsing 100elements and wait, till those elements are written to CoreData. After that, we parse the next 100 bundle

Parser

// Start the data parse
- (void) parse {

    _dictionaryQeue = [NSMutableArray new];

    xmlTextReaderPtr reader = xmlReaderForMemory([data bytes], [data length], NULL, NULL,
                                                 (XML_PARSE_NOBLANKS | XML_PARSE_NOCDATA | XML_PARSE_NOERROR | XML_PARSE_NOWARNING));

    if (!reader) {
        NSLog(@"Failed to create xmlTextReader");
        return;
    }

    while (xmlTextReaderRead(reader)) {

        @autoreleasepool {

            while (_isPaused) {

                //[NSThread sleepForTimeInterval:0.1];

            }

            switch (xmlTextReaderNodeType(reader)) {
                case XML_READER_TYPE_ELEMENT: {

                    NSMutableDictionary* elementDict = [NSMutableDictionary new];                    

                    //Create Object
                    xmlChar * nameString = xmlTextReaderGetAttribute(reader, (xmlChar*)"name");
                    if (nameString)
                    {
                        [elementDict setValue:[NSString stringWithUTF8String:(const char*)nameString] forKey:@"name"];

                        xmlFree(nameString);
                    }
                    //...

                    if (self.collectDictionaries) {

                        [_dictionaryQeue addObject:elementDict];
                        NSArray* dictArray = [NSArray arrayWithArray:_dictionaryQeue];

                        if ([dictArray count] == self.maxCollectedDictionaries) {

                            dispatch_async(dispatch_get_main_queue(), ^{

                                if (saxDelegate && [(NSObject*)saxDelegate respondsToSelector:@selector(SAXDictionaryElements:finished:)]) {

                                    [saxDelegate SAXDictionaryElements:dictArray finished:FALSE];

                                }

                            });

                            [_dictionaryQeue removeAllObjects];

                            _isPaused = TRUE;

                        }

                    }

                    elementDict = nil;

                }

                    break;

                case XML_READER_TYPE_END_ELEMENT: {

                    DLOGcomment(@"XML_READER_TYPE_END_ELEMENT");               
                    if (self.collectDictionaries) {

                        NSArray* dictArray = [NSArray arrayWithArray:_dictionaryQeue];

                        if ([dictArray count] > 0) {

                            dispatch_async(dispatch_get_main_queue(), ^{

                                if (saxDelegate && [(NSObject*)saxDelegate respondsToSelector:@selector(SAXDictionaryElements:finished:)]) {

                                    [saxDelegate SAXDictionaryElements:dictArray finished:TRUE];

                                }

                            });
                            data = nil;
                            [_dictionaryQeue removeAllObjects];
                            _dictionaryQeue = nil;

                        }

                    }
                }
                    break;
            }
        }
    }

    xmlTextReaderClose(reader);
    xmlFreeTextReader(reader);
    reader = NULL;
}
longi
  • 11,104
  • 10
  • 55
  • 89
0

DOM based parsers are quite convenient (TBXML, TouchXML, KissXML, TinyXML, GDataXML, RaptureXML, etc) especially those with XPATH support. But, memory becomes an issue as a DOM is created.

I am phasing the same memory constrains, so I started looking at wrappers for the Libxml2 XmlTextReader and so far I only found one IGXMLReader

IGXMLReader parses an XML document similar to the way a cursor would move. The Reader is given an XML document, and return a node (an IGXMLReader object) to each calls to nextObject.

Example,

IGXMLReader* reader = [[IGXMLReader alloc] initWithXMLString:@"<x xmlns:edi='http://ecommerce.example.org/schema'>\
                      <edi:foo>hello</edi:foo>\
                      </x>"];
for (IGXMLReader* node in reader) {
    NSLog(@"node name: %@", node.name);
}

This is a different approach to that of the NSXMLParser.

Alex Nolasco
  • 18,750
  • 9
  • 86
  • 81