I'm new to iceberg, and i have a question about query big table.
We have a Hive table with a total of 3.6 million records and 120 fields per record. and we want to transfer all the records in this table to other databases, such as pg, kafak, etc.
Currently we do like this:
Dataset<Row> dataset = connection.client.read().format("iceberg").load("default.table");
// here will stuck for a very long time
dataset.foreachPartition(par ->{
par.forEachRemaining(row ->{
```
});
});
but it can get stuck for a long time in the foreach process.
and I tried the following method, the process does not stay stuck for long, but the traversal speed is very slow, the traverse efficiency is about 50 records/second.
HiveCatalog hiveCatalog = createHiveCatalog(props);
Table table = hiveCatalog.loadTable(TableIdentifier.of("default.table"));
CloseableIterable<Record> records = IcebergGenerics.read(table) .build();
records.forEach( record ->{
```
});
Neither of these two ways can meet our needs, I would like to ask whether my code needs to be modified, or is there a better way to traverse all records? Thanks!