DataNucleus Memory/Cache Handling for large update/insert

Question

We are running application in Spring context using DataNucleus as our ORM mapping and mysql as our database.

Our application have a daily import job of some data feed into our database. The size of the data feed translate into around 1 millions row of insert/update. The performance of the import start out to be very good but then it degrade overtime (as the number of executed query increase) and at some point the application freeze or stop responding. We will have to wait for the whole job to complete before the application response again.

This behavior looks very like a memory leak to us and we have been looking hard at our code to catch any potential problem, however the problem didn't go away. One interesting thing we found from the heap dump is that org.datanucleus.ExecutionContextThreadedImpl (or the HashSet/HashMap) hold 90% of our memory (5GB) during the import. (I have attahed the screenshot of the dump below). My research on the internet said this reference is the Level1 Cache (not sure am i correct). My question is during a large import, how can i limit/control the size of the level1 cache. May be ask DN to not cache during my import?

If that's not the L1 cache, what's the possible cause of my memory issue?

Our code use a transaction for every insert to prevent locking of large chunk of data in the database. It's call the flush method every 2000 insert

As a temporary fix, we moved our import process to run overnight when no one is using our app. Obviously, this cannot go on forever. Please could someone at least point us in the right direction so that we can do more research and hoping we can find a fixes.

Would be good if someone have knowledge of decoding the heap dump

Your help would be very much appreciated by all of us here. Many thanks!

https://s3-ap-southeast-1.amazonaws.com/public-external/datanucleus_heap_dump.png

https://s3-ap-southeast-1.amazonaws.com/public-external/datanucleus_dump2.png

Code Below - Caller of this method does not have a transaction. This method will process one import object per call, and we need to process around 100K of these object daily

@Override
@PreAuthorize("(hasUserRole('ROLE_ADMIN')")
@Transactional(propagation = Propagation.REQUIRED)
public void processImport(ImportInvestorAccountUpdate account, String advisorCompanyKey) {

    ImportInvestorAccountDescriptor invAccDesc = account
            .getInvestorAccount();

    InvestorAccount invAcc = getInvestorAccountByImportDescriptor(
            invAccDesc, advisorCompanyKey);

    try {

        ParseReportingData parseReportingData = ctx
                .getBean(ParseReportingData.class);

        String baseCCY = invAcc.getBaseCurrency();
        Date valueDate = account.getValueDate();
        ArrayList<InvestorAccountInformationILAS> infoList = parseReportingData
                .getInvestorAccountInformationILAS(null, invAcc, valueDate,
                        baseCCY);

        InvestorAccountInformationILAS info = infoList.get(0);

        PositionSnapshot snapshot = new PositionSnapshot();
        ArrayList<Position> posList = new ArrayList<Position>();
        Double totalValueInBase = 0.0;
        double totalQty = 0.0;



        for (ImportPosition importPos : account.getPositions()) {
            Asset asset = getAssetByImportDescriptor(importPos
                    .getTicker());
            PositionInsurance pos = new PositionInsurance();
            pos.setAsset(asset);
            pos.setQuantity(importPos.getUnits());
            pos.setQuantityType(Position.QUANTITY_TYPE_UNITS);
            posList.add(pos);
        }

        snapshot.setPositions(posList);
        info.setHoldings(snapshot);

        log.info("persisting a new investorAccountInformation(source:"
                + invAcc.getReportSource() + ") on " + valueDate
                + " of InvestorAccount(key:" + invAcc.getKey() + ")");
        persistenceService.updateManagementEntity(invAcc);



    } catch (Exception e) {
        throw new DataImportException(invAcc == null ? null : invAcc.getKey(), advisorCompanyKey,
                e.getMessage());
    }

}

Including the code for the bulk import is a prerequisite for any comment. People don't see if you're using transactions, flushing regularly, or anything. In my experience an L1 cache is not a HashSet, and don't see how it could be when data has to be keyed by id — Neil Stockton, Aug 02 '13 at 08:02
Our code use a transaction for every insert to prevent locking of large chunk of data in the database. It's call the flush method every 2000 insert — Gavy, Aug 05 '13 at 01:34
yes, but where is the code? only you know why you have half a million queries there!!! — Neil Stockton, Aug 05 '13 at 17:18

score 0 · Accepted Answer · answered Aug 02 '13 at 21:00

0

Do you use the same pm for the entire job?

If so, you may want to try to close and create new ones once in a while.

If not, this could be the L2 cache. What setting do you have for datanucleus.cache.level2.type? It think it's a weak map by default. You may want to try none for testing.

answered Aug 02 '13 at 21:00

TheArchitect

2,161
1
12
16

The pm is retrieved from Spring TransactionAwarePersistenceManagerFactoryProxy so i guess it's the same pm. Regarding on DN L2 settings - we are using the default setting for L2 cache which i believe is of type "soft". Setting it to none is not an option for our production server as this would greatly impact performance – Gavy Aug 03 '13 at 10:43
so you may want to try closing the pm and getting a new between your batches? – TheArchitect Aug 03 '13 at 15:35
I believe the pm does closed after every transaction is committed. We are using a seperate transaction per insert so I guess by that we are already closing the pm. However i did try to do that explicitly in the code following your comment, the problem still persist – Gavy Aug 06 '13 at 03:26
You could also try to create a separate PMF for your job with datanucleus.cache.level2.type=none so that it doesn't affect your application. See if that helps. – TheArchitect Aug 06 '13 at 16:01
1

For anyone else having this problem. You can try look into Query extension in DataNucleus – Gavy Jun 12 '14 at 15:26

DataNucleus Memory/Cache Handling for large update/insert

1 Answers1