-1

hy, I hope you can help me! I have to split a huge file in smaller to put the data into db. I read a lot of post and I find a really good one this is the url:

How can I split a big XML file into smallers with PHP?

but I have some problem with it: 1. I have to read a xml with 400.000 record and the script stops at 170.000 and I really don't know how, is there some change I have to do? 2. is it possible to put data in ? 3. I have to read a huge file and any browser crash. do you know some software where I can read data from url for mac in a simple way?

really thanks!

MORE INFORMATION ABOUT XML FILE:

I copy and past the xml format; instead of three dots there are informations.

<?XML version=“1.0” encoding=“UTF-8” ?> 

<vortigo> 

<annuncio> 

<id_annuncio> <![CDATA[ . . . ]]> </id_annuncio> 
<link> <![CDATA[ . . . ]]> </link> 
<titolo> <![CDATA[ . . . ]]> </titolo> 
<tipo_contratto> <![CDATA[ . . . ]]> </tipo_contratto> 
<tipologia> <![CDATA[ . . . ]]> </tipologia> 
<descrizione> <![CDATA[ . . . ]]> </descrizione> 

<classe_energetica> <![CDATA[ . . . ]]> </classe_energetica>
<indice_energetica> <![CDATA[ . . . ]]> </indice_energetica>
<numero_stanze> <![CDATA[ . . . ]]> </numero_stanze>
<numero_bagni> <![CDATA[ . . . ]]> </numero_bagni>
<superficie> <![CDATA[ . . . ]]> </superficie>
<stato_immobile> <![CDATA[ . . . ]]> </stato_immobile>
<prezzo> <![CDATA[ . . . ]]> </prezzo> 
<prezzo_giorno> <![CDATA[ . . . ]]> </prezzo_giorno>
<prezzo_settimana> <![CDATA[ . . . ]]> </prezzo_settimana>
<prezzo_scontato> <![CDATA[ . . . ]]> </prezzo_scontato>

<comune> <![CDATA[ . . . ]]> </comune> 
<nazione> <![CDATA[ . . . ]]> </nazione> 
<regione> <![CDATA[ . . . ]]> </regione> 
<provincia> <![CDATA[ . . . ]]> </provincia> 
<indirizzo> <![CDATA[ . . . ]]> </indirizzo> 
<cap> <![CDATA[ . . . ]]> </cap>
<zona> <![CDATA[ . . . ]]> </zona>
<longitudine> <![CDATA [ . . . ]]> </longitudine>
<latitudine> <![CDATA[ . . . ]]> </latitudine>
<data_aggiornamento> <![CDATA[ . . . ]]> </data_aggiornamento> 
<immagini>

<immagine>
<immagine_url> <![CDATA[ . . . ]]> </immagine_url>
<immagine_titolo> <![CDATA[ . . . ]]> </immagine_titolo>
</immagine>

<immagine>
<immagine_url> <![CDATA[ . . . ]]> </immagine_url>
<immagine_titolo> <![CDATA[ . . . ]]> </immagine_titolo>
</immagine>

...
</immagini> 

<tipo_venditore> <![CDATA[ . . . ]]> </tipo_venditore>
<agenzia_nome> <![CDATA[ . . . ]]> </agenzia_nome> 
<agenzia_comune> <![CDATA[ . . . ]]> </agenzia_comune> 
<agenzia_email> <![CDATA[ . . . ]]> </agenzia_email> 
<agenzia_url> <![CDATA[ . . . ]]> </agenzia_url> 

<piscina> <![CDATA[ . . . ]]> </piscina> 
<giardino> <![CDATA[ . . . ]]> </giardino> 
<condizionatore> <![CDATA[ . . . ]]> </condizionatore> 
<riscaldamento> <![CDATA[ . . . ]]> </riscaldamento> 
<balcone> <![CDATA[ . . . ]]> </balcone> 
<terrazzo> <![CDATA[ . . . ]]> </terrazzo> 
<ascensore> <![CDATA[ . . . ]]> </ascensore> 
<cucina> <![CDATA[ . . . ]]> </cucina> 
<arredato> <![CDATA[ . . . ]]> </arredato> 
<parcheggio> <![CDATA[ . . . ]]> </parcheggio> 

<portale> <![CDATA[ . . . ]]> </portale> 
<tipo_portale> <![CDATA[ . . . ]]> </tipo_portale> 
<logo_portale> <![CDATA[ . . . ]]> </logo_portale> 

</vortigo>

information is entered into a database with all the columns for each data. Thnks in advance!!!!

Community
  • 1
  • 1
user2455263
  • 11
  • 1
  • 3
  • Try setting a bigger value for `max_execution_time`. For example `max_execution_time(120);` http://www.php.net/manual/en/info.configuration.php#ini.max-execution-time – enenen Jun 05 '13 at 11:06
  • hy, i did it in the php file. i put set_time_limit(900); ini_set('memory_limit', '20000M'); but not change. – user2455263 Jun 05 '13 at 11:22
  • You are asking the wrong question. Also large XML files in case you hit some memory limit, consider to use XMLReader, there are also iterators for it: [XMLReaderIterator](https://github.com/hakre/XMLReaderIterator) – hakre Jun 05 '13 at 11:28
  • 1
    Humm, then are you sure that your XML is valid? – enenen Jun 05 '13 at 11:29
  • your xml is not valid. you are missing a closing ` `-Element – ferdynator Jun 05 '13 at 12:37
  • Please stop rolling back your post to remove what was needed to re-open it. – George Stocker Jun 06 '13 at 15:21

2 Answers2

2

What code are your using to parse the XML? As the question you reffer to was answered you should not use the easy SimpleXML as it is very slow and memory intensive. Here is a simple example for the XMLReader-Class that works really efficient with larger files because it streams them and not reads in the file as a whole:

$xml = new XMLReader();
$xml->open('file.xml');

while ($xml->read()) {
    // elements only. skipp element end-tags and cdata etc
    if ($xml->nodeType == XMLReader::ELEMENT) {

          // process the Elements e.g. in a switch statement:
          switch ($xml->name) {
                //...
          }
    }
}

Your can apply changes for example in the switch-Statement as you can access the content via the $xml->readOuterXML() function. If you want an easier access to the content you might want to parse specific parts with the SimpleXMLElement again:

 $elem = new SimpleXMLElement($xml->readOuterXML());

Don't forget to unset the $elem after you are done, to free memory for upcoming entries. I use the exact same method and can parse 10k entries in 2s with a decent memory usage.

To your last question: You maybe want to split the content into subparts. Or you make the file downloadable so the user can open it as a whole on his computer. Unfortunatly HTTP is not the fastest protocoll and is also not designed for massive file sizes.

Edit: I updated my gist on github to match your example-data. It might take some more configuration e.g. because your <immagini> requires some nested loops. But it will give you a good idea about how to solve this issue.

ferdynator
  • 6,245
  • 3
  • 27
  • 56
  • hy, I'm using the script that starts :"function processChunk($xmlstring) { GLOBAL $CHUNKS; $xp = fopen($file = "output-$CHUNKS.xml", "w"); fwrite($xp, ''."\n"); fwrite($xp, ""); fwrite($xp, $xmlstring); fwrite($xp, ""); fclose($xp); print "Written $file\n"; $CHUNKS++; } and so on.. what i want is to divide the xml file in smaller file to insert data in the database – user2455263 Jun 05 '13 at 11:20
  • in what kind of format do you want the xml in the database? as a string or mapped to a certain table? – ferdynator Jun 05 '13 at 11:26
  • i would like it in a table with multiple records! – user2455263 Jun 05 '13 at 11:27
  • that can be done with my code. add a `case` for every element that matches a table and then insert the values into your database. I created an example in this [gist](https://gist.github.com/ferdynator/644a0cfedea3c9072546) – ferdynator Jun 05 '13 at 11:34
  • really thank for your example, i'll try. my xml file is like .. .. .. i can use it as well? thanks – user2455263 Jun 05 '13 at 11:40
  • sure. Check out the [SimpleXML-Dokumentation](http://de3.php.net/manual/de/class.simplexmlelement.php) it should give you a clue about how to go from here. Glad I could help :) – ferdynator Jun 05 '13 at 11:43
  • hy, i'm reading the url you linked (simplexml-documentation) but i can not see the light. your script it's really good, to charge all the date i have to add data in the swich case? i'm getting crazy... – user2455263 Jun 05 '13 at 12:09
  • you need to supply some more information about the structure of your xml-Element if you want me to help you more. Plase add it to your question – ferdynator Jun 05 '13 at 12:12
  • i copy and past the xml format; instead of three dots there are informations. <![CDATA[ . . . ]]> obbligatorio <![CDATA[ . . . ]]> <![CDATA[ . . . ]]> <![CDATA[ . . . ]]> <![CDATA[ . . . ]]> i really thanks you for your helps! – user2455263 Jun 05 '13 at 12:18
  • please [edit your question](http://meta.stackexchange.com/questions/21788/how-does-editing-work) accordingly and format the code corretcly so it is easier to review. ty :) – ferdynator Jun 05 '13 at 12:24
0

Most likely you script crashes by one of following reasons: 1) Memory/time limit for PHP script. It can be setted in your php.ini file 2) Incorrect values in you XML, what can't be parsed by parser script you use.