0

Fellow Programmers,

I searched the forum , but couldn't find answer to my problem.

I am trying to parse 2 GB xml file in C using expat, here snippet from my code ( I have removed most of the part which does not relate to my problem ),

void main(int argc, char **argv) {
  XML_Parser p = XML_ParserCreate(NULL);
  FILE *fp;
  fp = fopen("/dev/shm/GNBIExport_XML_RT_06_11_2015_07_48_53_953_10_100_5_153.xml","r");
  XML_UseParserAsHandlerArg(p);
  XML_SetElementHandler(p, start_hndl, end_hndl);
  XML_SetCharacterDataHandler(p, char_hndl);
  char buffer[10000000];
  memset(buffer,0,10000000);
  size_t file_size=0;
  file_size=fread(buffer,sizeof(char),10000000,fp);
  while(file_size != 0 ){
    if ( XML_Parse(p,buffer,strlen(buffer),XML_FALSE) == XML_STATUS_ERROR ){


           printf("Encountered error\n");
             exit(-1);
        }
file_size=fread(buffer,sizeof(char),10000000,fp);
      }
}

As you see I am reading from file and putting into buffer of size 10000000.

My problem is , I get some times malformed XML error or mismatched tag error. My understanding is because the xml file is huge so when the data is read into buffer , it might miss to include the closing tag in buffer for which open tag is present in buffer - thats why mismatched tag error, Malformed XML error because , instead of reading a complete tag like for example xml is

<Transmission>  <BTSTEMPLATERSC> <attributes><TEMPLATENAME>defaultOfBTS30</TEMPLATENAME></attributes> </BTSTEMPLATERSC>

and buffer reads only

<Transmission>  <BTSTEMPLATERSC> <attributes><TEMPLATENAME>defaultOfBTS30</TEMPLATENAME></attributes> </BTSTEMPL

BTSTEMPLATERSC tag is not complete hence I get malformed xml error.

So , can some one please help me know how can I read a chunk of xml data correctly so that these two errors can be avioded ?

Thanks Sarwesh

Sarwesh Suman
  • 141
  • 2
  • 4
  • I don't know Expat but with [libxml2](http://www.xmlsoft.org/) you can avoid to load the whole file in a bufffer. – LPs Feb 19 '16 at 09:56
  • Would it make more sense to use a SAX compliant parser? 2gb is pretty large file... – Tim Feb 19 '16 at 09:58
  • I don't know if there's a SAX compliant parser in C, can you suggest ? – Sarwesh Suman Feb 19 '16 at 10:12
  • 1
    You are attempting to read 10000000 bytes, knowing that you will read fewer bytes... the number of bytes read is returned to "file_size"... and yet you do not use "file_size" as a parameter to XML_Parse() ... instead you use "strlen(buffer)"... Each read is placed into buffer... and yet you do not "memset" (i.e. clear out) buffer before each read. – TonyB Feb 19 '16 at 10:38
  • TonyB --> sorry , it is not industralized code I wrote here. It is only to help me with my problem. Thanks for your comment. Any suggestion to my problem is most welcome. – Sarwesh Suman Feb 22 '16 at 09:20

0 Answers0