I have several files to parse (with PHP) in order to insert their respective content in different database tables.
First point : the client gave me 6 files, 5 are CSV with values separated by coma ; The last one do not come from the same database and its content is tabulation-based.
I built a FileParser that uses SplFileObject to execute a method on each line of the file-content (basically, create an Entity with each dataset and persist it to the database, with Symfony2 and Doctrine2).
But I cannot manage to parse the tabulation-based text file with SplFileObject, it does not split the content in lines as I expect it to do...
// In my controller context
$parser = new MyAmazingFileParser();
$parser->parse($filename, $delimitor, function ($data) use ($em) {
$e = new Entity();
$e->setSomething($data[0);
// [...]
$em->persist($e);
});
// In my parser
public function parse($filename, $delimitor = ',', $run = null) {
if (is_callable($run)) {
$handle = new SplFileObject($filename);
$infos = new SplFileInfo($filename);
if ($infos->getExtension() === 'csv') {
// Everything is going well here
$handle->setCsvControl(',');
$handle->setFlags(SplFileObject::DROP_NEW_LINE + SplFileObject::READ_AHEAD + SplFileObject::SKIP_EMPTY + SplFileObject::READ_CSV);
foreach (new LimitIterator($handle, 1) as $data) {
$result = $run($data);
}
} else {
// Why does the Iterator-way does not work ?
$handle->setCsvControl("\t");
// I have tried with all the possible flags combinations, without success...
foreach (new LimitIterator($handle, 1) as $data) {
// It always only gets the first line...
$result = $run($data);
}
// And the old-memory-killing-dirty-way works ?
$fd = fopen($filename, 'r');
$contents = fread($fd, filesize($filename));
foreach (explode("\t", $contents) as $line) {
// Get all the line as I want... But it's dirty and memory-expensive !
$result = $run($line);
}
}
}
}
It is probably related with the horrible formatting of my client's file, but after a long discussion with them, they really cannot get another format for me, for some acceptable reasons (constraints in their side), unfortunately.
The file is currently long of 49459 lines, so I really think the memory is important at this step ; So I have to make the SplFileObject way working, but do not know how.
An extract of the file can be found here : Data-extract-hosted