0

I want to load larget row of data, so my plan is divide the statement to parts, divided by timestamp, and than run it asynchronously.

...
// List to save ResultSets
List<CompletableFuture<AsyncResultSet>> pending = new ArrayList<>();

for(Range range : ranges) {
    System.out.println("Asynchronous execute query will be called soon!");
    pending.add(executeQuery(session, preparedStatement, range));
}

...

private static CompletableFuture<AsyncResultSet> executeQuery(CqlSession session, 
    PreparedStatement preparedStatement, Range range) {

return session
    .executeAsync(preparedStatement.bind()
        .setInstant("startDateTime", range.getStartDateTime().toInstant())
        .setInstant("endDateTime", range.getEndDateTime().toInstant())
        .setPageSize(1000000))
    .toCompletableFuture()
    .whenCompleteAsync((asyncResultSet, throwable) -> {
        if (throwable == null) {
            System.out.println("Range " + range.getStart() + " to " + range.getEnd() + 
                " has " + asyncResultSet.remaining() + " records.");

            fetchResultSet(asyncResultSet, throwable);

            if(asyncResultSet.hasMorePages()) {
                asyncResultSet.fetchNextPage().whenComplete(LoadCassandraAsync::fetchResultSet);
            }
        } else {
            throwable.printStackTrace();
        }
    }, Executors.newFixedThreadPool(4))
    .exceptionally(throwable -> {
        throwable.printStackTrace();
        return null;
    });
}

I will get randomly exit code 0 (not from main method), indicated it closed. Or, I will get nothing after some fetching, just like there is a thread running but does not do anything.

If I commented "row fetching" part, I got:

...
Asynchronous execute query will be called soon!
Asynchronous execute query will be called soon!
Asynchronous execute query will be called soon!
Asynchronous execute query will be called soon!
Range 2020-02-14 00:00:00+0700 to 2020-02-14 01:00:00+0700 has 102974 records.
Range 2020-02-14 01:00:00+0700 to 2020-02-14 02:00:00+0700 has 98201 records.
Range 2020-02-14 06:00:00+0700 to 2020-02-14 07:00:00+0700 has 104529 records.
Range 2020-02-14 08:00:00+0700 to 2020-02-14 09:00:00+0700 has 105257 records.
...

I think it means the executeQuery() method worked well.

What I did incorrectly?

panoet
  • 3,608
  • 1
  • 16
  • 27

1 Answers1

0

Depending on the number of the queries you might be exhausting the cassandra threads - concurrent_reads (if I remember right, the default number is 250).
If you check the logs (/var/log/cassandra/system.log) there should be a message related to the issue. To fix this, add an artificial Thread.wait after sending 200 queries for example.

Alex Tbk
  • 2,042
  • 2
  • 20
  • 38
  • What do you mean by "queries"? If I looping through ResultSet and do several get in each row, is it count as 1 query? – panoet Feb 24 '20 at 05:53
  • You execute your queries in a async way. That means if you have 1000 queries, all these queries are being executed at the same time. – Alex Tbk Feb 24 '20 at 06:27
  • Is that mean it is not wise to execute large data asynchronously? I have to load historical data up to a year ago. My plan is to divide it by hour, and process it asynchronously. FYI, a hour data contains about 3 million row. Any suggestions? – panoet Feb 24 '20 at 13:38