Changing the GemFire query ResultSender batch size

Question

I am experiencing a performance issue related to the default batch size of the query ResultSender using client/server config. I believe the default value is 100.

If I run a simple query to get keys (with some order by columns due to the PARTITION Region type), this default batch size causes too many chunks being sent back for even 1000 records. In my tests, even the total query time is only less than 100 ms, however, the app takes more than 10 seconds to process those chunks.

If my answer does not satisfy your question/problem, then I might suggest you add more detail to your problem statement: steps to reproduce, expected results, actual results, and preferable test repo that reproduces the problem. — John Blum, Aug 01 '19 at 18:49

score 0 · Answer 1 · answered Aug 01 '19 at 18:48

Reading between the lines in your problem statement, it seems you are:

Executing an OQL query on a PARTITION Region (PR).
Running the query inside a Function as recommended when executing queries on a PR.
Sending batch results (as opposed to streaming the results).

I also assume since you posted exclusively in the #spring-data-gemfire channel, that you are using Spring Data GemFire (SDG) to:

Execute the query (e.g. by using the SDG GemfireTemplate; Of course, I suppose you could also be using the GemFire Query API inside your Function directly, too)?
Implemented the server-side Function using SDG's Function annotation support?
And, are possibly (indirectly) using SDG's BatchingResultSender, as described in the documentation?

NOTE: The default batch size in SDG is 0, NOT 100. Zero means stream the results individually.

Regarding #2 & #3, your implementation might look something like the following:

@Component
class MyApplicationFunctions {

  @GemfireFunction(id = "MyFunction", batchSize = "1000")
  public List<SomeApplicationType> myFunction(FunctionContext functionContext) {

    RegionFunctionContext regionFunctionContext = 
      (RegionFunctionContext) functionContext;

    Region<?, ?> region = regionFunctionContext.getDataSet();

    if (PartitionRegionHelper.isPartitionRegion(region)) {
      region = PartitionRegionHelper.getLocalDataForContext(regionFunctionContext);
    }

    GemfireTemplate template = new GemfireTemplate(region);

    String OQL = "...";

    SelectResults<?> results = template.query(OQL); // or `template.find(OQL, args);`

    List<SomeApplicationType> list = ...;

    // process results, convert to SomeApplicationType, add to list

    return list;
  }
}

NOTE: Since you are most likely executing this Function "on Region", the FunctionContext type will actually be a RegionFunctionContext in this case.

The batchSize attribute on the SDG @GemfireFunction annotation (used for Function "implementations") allows you to control the batch size.

Of course, instead of using SDG's GemfireTemplate to execute queries, you can, of course, use the GemFire Query API directly, as mentioned above.

If you need even more fine grained control over "result sending", then you can simply "inject" the ResultSender provided by GemFire to the Function, even if the Function is implemented using SDG, as shown above. For example you can do:

@Component
class MyApplicationFunctions {

  @GemfireFunction(id = "MyFunction")
  public void myFunction(FunctionContext functionContext, ResultSender resultSender) {

    ...

    SelectResults<?> results = ...;

    // now process the results and use the `resultSender` directly
  }
}

This allows you to "send" the results however you see fit, as required by your application. You can batch/chunk results, stream, whatever.

Although, you should be mindful of the "receiving" side in this case!

The 1 thing that might not be apparent to the average GemFire user is that GemFire's default ResultCollector implementation collects "all" the results first before returning them to the application. This means the receiving side does not support streaming or batching/chunking of the results, allowing them to be processed immediately when the server sends the results (either streamed, batched/chunked, or otherwise).

Once again, SDG helps you out here since you can provide a custom ResultCollector on the Function "execution" (client-side), for example:

@OnRegion("SomePartitionRegion", resultCollector="myResultCollector")
interface MyApplicationFunctionExecution {

  void myFunction();
}

In your Spring configuration, you would then have:

@Configuration
class ApplicationGemFireConfiguration {

  @Bean
  ResultCollector myResultCollector() {
    return ...;
  }
}

Your "custom" ResultCollector could return results as a stream, a batch/chunk at a time, etc.

In fact, I have prototyped a "streaming" ResultCollector implementation that will eventually be added to SDG, here.

Anyway, this should give you some ideas on how to handle the performance problem you seem to be experiencing. 1000 results is not a lot of data so I suspect your problem is mostly self-inflicted.

Hope this helps!

Thanks a lot for answering my questions. I am sorry that I didn't make myself clear enough. What I am trying to do is to create the pagination according to your suggestions in another thread. In my case, since filtering and sorting criteria are quite dynamic. I have to use SDG GemfireTemplate to fire those generated dynamic queries. As the first step of the pagination, I want to get back the keys(with sorting columns due to Partition type) till the current page (using LIMIT keyword), then I can just retrieve the objects I need. I do not have any server functions defined. — Yu Wang, Aug 01 '19 at 19:15
I see; thank you for clarifying. PagingAndSortingRepositories are something I plan to get to later this year, near S1P timeframe. Sorry for the delay, but you are on the right path, in thinking. — John Blum, Aug 01 '19 at 19:31

score 0 · Answer 2 · answered Aug 01 '19 at 19:30

0

John,

Just to clarify, I use client/server topology(actually wan, but that is not important in here). My client is a spring boot web app which has kendo grid as ui. Users can filter/sort on any combination of the columns, which will be passed to the spring boot app for generating dynamic OQL and create the pagination. Till now, except for being dynamic, my OQL queries are quite straight forward. I do not want to introduce server side functions due to the complexity of our global deployment process. But I can if you think that is something I have to do.

Again, thanks for your answers.

answered Aug 01 '19 at 19:30

Yu Wang

11
1

Thanks for clarifying. Generally, I recommend users follow these guidelines when using GemFire's `QueryService`, especially when the Region is a PR... https://gemfire.docs.pivotal.io/97/geode/developing/querying_basics/querying_partitioned_regions.html. – John Blum Aug 01 '19 at 19:34
I have double checked and didn't find any of my configurations are against the suggestion. Actually with a simple oql like "select key, name from /region order by name limit 1000", in the debugger, I saw 10 chunks of data were back, each chunk only takes 1/20 capacity of the buffer in my case. the code block below in AbstractOp.class was executed 10 times. this causes the performance issue. do { msg.receiveChunk(); callback.handle(msg); } while(!msg.isLastChunk()); – Yu Wang Aug 01 '19 at 19:44

Changing the GemFire query ResultSender batch size

2 Answers2