I need to insert a large xlsx (near 1 million rows) file in the Cassandra database and i'm having doubts about how to do it because of memory limitations.
I'm working with batch inserts but that is proving to be near impossible due to huge memory impact.
$batch = new Cassandra\BatchStatement(Cassandra::BATCH_UNLOGGED);
foreach ($workbook->createRowIterator($myWorksheetIndex) as $rowIndex => $values) {
if ($count > 0) {
$time = is_int($values[3]) ? $values[2]->format('d-m-Y') . ' ' . date('00:00:00') : $values[2]->format('d-m-Y') . ' ' . $values[3]->format('H:i:s');
$date = date_convert(DateTime::createFromFormat('d-m-Y H:i:s', $time));
$prepared = $session->prepare(
"INSERT INTO teste (ptd_assoc,ref_equip,dates) " .
"VALUES (?, ?, ?)"
);
$batch->add($prepared, array(
'ptd_assoc' => $values[0],
'ref_equip' => $values[1],
'dates' => new Cassandra\Timestamp(strtotime($date)),
// 'load' => 3.4454
));
}
$count++;
}
$session->execute($batch);
I have successfully transform the xlsx into a more readable csv file. It's possible to Copy
it to the database using the Cassandra\SimpleStatement
method?