I am using the Cassandra-driver node package (4.6.3). I am currently trying to migrate data from Cassandra to SQL. There are around 450 000 records, tried using ETL tools like Altryx but that doesn't work as there are unsupported data types like map and list. So trying to do the migration using node JS scheduler. But the problem is that Cassandra's driver does not stream through all the records. Below is the codebase.
let output = [], params = []
let query = `select * from users`
cassandraDriver.client.eachRow(query, params, { prepare : false , fetchSize: 1000 }, function(n, rows) {
output.push(1)
}, async function(err, result){
if(result.nextPage) {
result.nextPage()
} else if(output.length > 0) {
console.log('Total size : ', output.length)
}
})
When I check the Cassandra DB for the query (select count(*) from users) the count is different from the output value that I get in the above case. It seems like there is a mismatch in the number. Looks like it does not stream all the rows that are present in Cassandra. Any idea why this is happening? Is it the problem with the package? I would love to have an alternative for the same. Really breaking my head here to figure this out.
No idea why the node driver gives out a random count. Multiples of 1000 fetch size work fine, but that last batch which is less than 1000 kinda screwes up. For example, if I have 9602 records in Cassandra, the records I get streamed using the node Cassandra driver maybe around 9588. Not sure why the last 14 records are not considered.