13

xml:19558: parser error : XML declaration allowed only at the start of the document

any solutions? i am using php XMLReader to parse a large XML file, but getting this error. i know the file is not well formatted but i think its not possible to go through the file and remove these extra declarations. so any idea, PLEASE HELP

Aamir
  • 273
  • 2
  • 5
  • 10
  • 3
    If it's not well-formed, it's not XML. If it's not XML, then XMLReader isn't going to play nicely. – drudge Mar 29 '11 at 22:11
  • 1
    the only problem with the file is multiple declarations :( () anyway out? – Aamir Mar 29 '11 at 22:16
  • Need to deleted spaces! Here is video how to spot and fix such errors: https://www.youtube.com/watch?v=4jWhO07ICvw – Juri Fab Nov 28 '16 at 13:44

4 Answers4

31

Make sure there isn't any white space before the first tag. Try this:

    <?php
//Declarations
$file = "data.txt"; //The file to read from.

#Read the file
$fp = fopen($file, "r"); //Open the file
$data = ""; //Initialize variable to contain the file's content
while(!feof($fp)) //Loop through the file, read it till the end.
{
    $data .= fgets($fp, 1024); //append next kb to data
} 
fclose($fp); //Close file
#End read file
$split = preg_split('/(?<=<\/xml>)(?!$)/', $data); //Split each xml occurence into its own string

foreach ($split as $sxml) //Loop through each xml string
{
    //echo $sxml;
    $reader = new XMLReader(); //Initialize the reader
    $reader->xml($sxml) or die("File not found"); //open the current xml string
    while($reader->read()) //Read it
    {
        switch($reader->nodeType)
        {
            case constant('XMLREADER::ELEMENT'): //Read element
                if ($reader->name == 'record')
                {
                    $dataa = $reader->readInnerXml(); //get contents for <record> tag.
                    echo $dataa; //Print it to screen.
                }
            break;
        }
    }
    $reader->close(); //close reader
}
?>

Set the $file variable to the file you want. Note I don't know how well this will work for a 4gb file. Tell me if it doesn't.

EDIT: Here is another solution, it should work better with the larger file (parses as it is reading the file).

<?php
set_time_limit(0);
//Declarations
$file = "data.txt"; //The file to read from.

#Read the file
$fp = fopen($file, "r") or die("Couldn't Open"); //Open the file

$FoundXmlTagStep = 0;
$FoundEndXMLTagStep = 0;
$curXML = "";
$firstXMLTagRead = false;
while(!feof($fp)) //Loop through the file, read it till the end.
{
    $data = fgets($fp, 2);
    if ($FoundXmlTagStep==0 && $data == "<")
        $FoundXmlTagStep=1;
    else if ($FoundXmlTagStep==1 && $data == "x")
        $FoundXmlTagStep=2;
    else if ($FoundXmlTagStep==2 && $data == "m")
        $FoundXmlTagStep=3;
    else if ($FoundXmlTagStep==3 && $data == "l")
    {
        $FoundXmlTagStep=4;
        $firstXMLTagRead = true;
    }
    else if ($FoundXmlTagStep!=4)
        $FoundXmlTagStep=0;

    if ($FoundXmlTagStep==4)
    {
        if ($firstXMLTagRead)
        {
            $firstXMLTagRead = false;
            $curXML = "<xm";
        }
        $curXML .= $data;

        //Start trying to match end of xml
        if ($FoundEndXMLTagStep==0 && $data == "<")
            $FoundEndXMLTagStep=1;
        elseif ($FoundEndXMLTagStep==1 && $data == "/")
            $FoundEndXMLTagStep=2;
        elseif ($FoundEndXMLTagStep==2 && $data == "x")
            $FoundEndXMLTagStep=3;
        elseif ($FoundEndXMLTagStep==3 && $data == "m")
            $FoundEndXMLTagStep=4;
        elseif ($FoundEndXMLTagStep==4 && $data == "l")
            $FoundEndXMLTagStep=5;
        elseif ($FoundEndXMLTagStep==5 && $data == ">")
        {
            $FoundEndXMLTagStep=0;
            $FoundXmlTagStep=0;
            #finished Reading XML
            ParseXML ($curXML);
        }
        elseif ($FoundEndXMLTagStep!=5)
            $FoundEndXMLTagStep=0;
    }
} 
fclose($fp); //Close file
function ParseXML ($xml)
{
    //echo $sxml;
    $reader = new XMLReader(); //Initialize the reader
    $reader->xml($xml) or die("File not found"); //open the current xml string
    while($reader->read()) //Read it
    {
        switch($reader->nodeType)
        {
            case constant('XMLREADER::ELEMENT'): //Read element
                if ($reader->name == 'record')
                {
                    $dataa = $reader->readInnerXml(); //get contents for <record> tag.
                    echo $dataa; //Print it to screen.
                }
            break;
        }
    }
    $reader->close(); //close reader
}
?>
Jess
  • 8,628
  • 6
  • 49
  • 67
  • no dear thts not the case. actaully this line () is duplicated in the file many times.. thts what the error report says. – Aamir Mar 29 '11 at 22:12
  • you have – Jess Mar 29 '11 at 22:13
  • yes, but it is multiple times now, how to solve the issue? some thing like removing these extra tags, but how??? – Aamir Mar 29 '11 at 22:17
  • You could split the string at each – Jess Mar 29 '11 at 22:27
  • thts a nice idea, but how dear? :( – Aamir Mar 29 '11 at 22:32
  • i am v new to this, no idea about how to do this, please help me out – Aamir Mar 29 '11 at 22:33
  • Try this: `$split = preg_split('/(?<= – Jess Mar 29 '11 at 22:36
  • let me post my code here, then let me know where to use this code in that. – Aamir Mar 29 '11 at 22:40
  • $reader = new XMLReader(); $reader->open('data.xml') or die("File not found"); while($reader->read()) { switch($reader->nodeType) { case constant('XMLREADER::ELEMENT'): if ($reader->name == 'record') { $dataa= $reader->readInnerXml(); } Break; } $reader->close(); – Aamir Mar 29 '11 at 22:41
  • hmm...thts good but i hv a file, not string :( please let me know how to replace that string with the file name. – Aamir Mar 29 '11 at 23:03
  • thanks a lot dear, now let me confirm please that what the first part gona do? as i hv XML file not the text, and that is very large in size, will this work for that? – Aamir Mar 29 '11 at 23:13
  • I can add comments on each part, if you like. Give me a minute. – Jess Mar 29 '11 at 23:13
  • trying this, let me check whether it gonna work for that or not, but it doesnt seems to be working for that large file :( – Aamir Mar 29 '11 at 23:14
  • but whether it works or not, i am really grateful for ur time. thx really a lot. – Aamir Mar 29 '11 at 23:15
  • I guess you could parse it while you were looping through it (fyi 4gb will take a long time to parse, think about set_time_limit(0), to fix this). – Jess Mar 29 '11 at 23:18
  • Try the one I just put up. Should work with the new file, if it doesn't, give me the error code – Jess Mar 30 '11 at 00:18
2

Another possible cause to this problem is unicode file head. If your XML's encoding is UTF-8, the file content will always start with these 3 bytes "EF BB BF". These bytes may be interpreted incorrectly if one attempts to convert from byte array to string. The solution is to write byte array to file directly without reading getString from the byte array.

ASCII has no file head Unicode: FF FE UTF-8: EF BB BF UTF-32: FF FE 00 00

Just open the file in ultraedit and you can see these bytes.

kaven
  • 21
  • 1
1

If you have multiple XML declarations, you likely have a concatenation of many XML files, and also more than one root element. It's not clear how you would meaningfully parse them.

Try really hard to get the source of the XML to give you real XML first. If that doesn't work, see if you can do some preprocessing to fix the XML before you parse it.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • hmm.. can u please let me know how to remove these extra declarations? any simple php code? actually i m very new to all this and just stuck here. – Aamir Mar 29 '11 at 22:15
  • i dint get wht u mean by this...! Try really hard to get the source of the XML to give you real XML first. – Aamir Mar 29 '11 at 22:15
  • Where do you get the XML from? Can you speak to the person in charge of producing the XML, because it isn't right, and should be corrected. For fixing the XML, look into PHP string replacement. – Ned Batchelder Mar 29 '11 at 22:16
  • Give us the entire xml document, and the php parser you have. Then we can help some more – Jess Mar 29 '11 at 22:16
  • @ mazzzzz ok sure, but how can i provide u the file,its around 4 GBs :( – Aamir Mar 29 '11 at 22:18
  • @ Ned Batchelder right now i cant do anythign like asking the persons to correct the file :( PHP string replacement? – Aamir Mar 29 '11 at 22:19
  • Wow, why would you ever have a 4 gb xml file. Seems like I would use a database for that size of data. You could try str_split, but the size of the strings could be bigger than php can handle. I agree with AAmir, talk to the people who build the xml. – Jess Mar 29 '11 at 22:32
  • dear its the data set of which i want to insert into a database...:( – Aamir Mar 29 '11 at 22:53
0

Its a bug of php Storm If you using php storm , php storm Makes your code start from the second line (no matter what you do) ! So you should go to your host and edit your file by direct admin or cpanel editor and put your

   <?xml version=“1.0” encoding=“UTF-8” ?>

Code at the first line, “hope it helps”

Arian Fm
  • 314
  • 4
  • 14