1

I am receiving a 'GC overhead limit exceeded' error when running gradle mlExportBatchesToDirectory.

The gradle command is:

gradle mlExportBatchesToDirectory -PexportPath=/qa2/tgtfiles/Customer -PbatchSize=25000 -Ptransform=CustomerTransform,querydate1,"$querydate1" -PfilenamePrefix="Customer_Daily_$(date "+%Y%m%d")_" -PfilenameExtension=.json -PwhereUrisQuery='cts.andQuery([cts.collectionQuery("latest"),cts.orQuery([cts.andQuery([cts.collectionQuery("customer"),cts.jsonPropertyScopeQuery("PolicyDownloadInfo", cts.trueQuery())]),cts.collectionQuery("customerpreference"),cts.collectionQuery("registrationcontacts")]),cts.notQuery(cts.jsonPropertyValueQuery("PartyId", "defaultprimary")),cts.notQuery(cts.jsonPropertyValueQuery("PartyId", "defaultadditional"))])' -PenvironmentName=qa

Even if I reduce the batch sizes down to 50, the error still occurs.

Hank
  • 131
  • 4
  • How large are the documents? I can definitely see 25k docs in a batch being a potential problem. Under the hood, the task is using the ExportToWriterListener in marklogic-client-api, and that class requires that each document be converted into a string, which is likely what causes the memory issue. But I would think a batchSize of 50 or less would solve the problem. I'd try starting with a very simple query that returns a small number of documents - possibly just 1 to start - and then gradually work up to the full query to see when things break. – rjrudin Jun 19 '21 at 22:48
  • The documents are about 2 - 3kb each. The full query is bringing back 1.6 million documents, which is the issue. When I pared down the query to a collection that was only 250k, there was no memory error even with batch sizes of 25k. However, No matter the batch size, the full query always fails. I still need to figure out how to transform and export all 1.6 million, do you have any suggestions on how to work around this memory error? – Hank Jun 21 '21 at 18:21
  • What version of ml-gradle? It's using the ML Java Client under the hood, and specifically a QueryBatcher, which is expected to pull batches of documents back so that e.g. it's never trying to hold 1.6 million documents in memory at one time. I'm wondering if there's a bug there where all the documents are coming back. – rjrudin Jun 22 '21 at 20:17
  • That's an interesting thought. How do I check which specific version of ml-gradle that I have? It's not in my build.gradle file. I do have version 4.5.1 of gradle and dhf version 5.2.3 if that helps. – Hank Jun 28 '21 at 19:51
  • "./gradlew buildEnvironment" . 5.2.3 is using a significantly older version of the ML Java Client as well. – rjrudin Jul 01 '21 at 11:13

0 Answers0