1

The below code takes long to process records My Db has millions of records

  • 1000 records processed
  • Time Taken: 32149s
/**
     * Saves monthly budget accounts to the database.
     * 
     * @param dto                       the list of monthly budget accounts to save
     * @param branchRepo                the repository for branches
     * @param costCodesRepo             the repository for cost codes
     * @param budgetVersionRepo         the repository for budget versions
     * @param monthlyBudgetAccountsRepo the repository for monthly budget accounts
     * @return the number of accounts saved
     */
    public int saveMonthlyBudgetAccounts(List<MonthlyBudgetAccountsDTO> dto) {

        long startTime = System.currentTimeMillis(); 
        
        // Fetch required data from database and store in memory for later use
        List<Branch> branchList = branchRepo.findAll();
        Map<String, Branch> branchMap = branchList.parallelStream()
                .collect(Collectors.toMap(b -> b.getSapBranch().toLowerCase(), Function.identity()));

        List<CostCodes> costCodesList = costCodesRepo.findAll();
        Map<String, CostCodes> costCodesMap = costCodesList.parallelStream()
                .collect(Collectors.toMap(c -> c.getCostCodeDescription().toUpperCase().trim(), Function.identity()));

        List<BudgetVersion> budgetVersionList = budgetVersionRepo.findAll();
        Map<String, BudgetVersion> budgetVersionMap = budgetVersionList.parallelStream()
                .collect(Collectors.toMap(b -> b.getBudgetVersion().toLowerCase().trim(), Function.identity()));

        int count = 0;
        
        
        for (MonthlyBudgetAccountsDTO acc : dto) {
            MonthlyBudgetAccounts m = new MonthlyBudgetAccounts();
            MonthlyBudgetAccountsPK pk = new MonthlyBudgetAccountsPK();
            String branch = acc.getId().getBranch().toLowerCase();
            String costCode = acc.getId().getCostCode().toUpperCase();
            String budgetVersion = acc.getId().getBudgetVersion().toLowerCase();
            try {
                // Set the branch
                Branch branchObj = branchMap.getOrDefault(branch.substring(branch.length() - 4), null);
                if (branchObj == null) {
                    // Handle case where branch is not found
                    branchObj = branchRepo.findById(900); // Replace with default branch object
                }
                pk.setBranch(branchObj);

                // Set the cost code
                CostCodes costCodeObj = costCodesMap.get(costCode.trim());
                if (costCodeObj == null) {
                    // Handle case where cost code is not found
                    throw new NoSuchElementException("Cost code not found");
                }
                pk.setCostCode(costCodeObj);

                // Set the budget version
                BudgetVersion budgetVersionObj = budgetVersionMap.get(budgetVersion.trim());
                if (budgetVersionObj == null) {
                    // Handle case where budget version is not found
                    throw new NoSuchElementException("Budget version not found");
                }
                pk.setBudgetVersion(budgetVersionObj);

                // Set other fields
                pk.setAccount(acc.getId().getAccount());
                pk.setFinancialYear(acc.getId().getFinancialYear());

                m.setId(pk);
                m.setTotal(acc.getTotal());
                m.setMarch(acc.getMarch());
                m.setApril(acc.getApril());
                m.setMay(acc.getMay());
                m.setJune(acc.getJune());
                m.setJuly(acc.getJuly());
                m.setAugust(acc.getAugust());
                m.setSeptember(acc.getSeptember());
                m.setOctober(acc.getOctober());
                m.setNovember(acc.getNovember());
                m.setDecember(acc.getDecember());
                m.setJanuary(acc.getJanuary());
                m.setFebruary(acc.getFebruary());
                m.setInsertDate(acc.getInsertDate().trim());

            monthlyBudgetAccountsRepo.save(m);

            // Clear the DTO to free up memory
            dto.set(count, null);
            count++;

            if (count % 1000 == 0) {
                System.out.println(count + " records processed");
                long endTime = System.currentTimeMillis();
                System.out.println(count + " - Time Taken: " + (endTime - startTime) + "s");
                startTime = 0;
            }
        } catch (NoSuchElementException e) {
            // Handle any errors that occur while processing the record
            e.printStackTrace();
        }
        
        }
        return count;
}
  • Use parallel streams: Since the code is using streams to collect data from the database, it can benefit from using parallel streams to improve performance. By default, streams are processed sequentially, but parallel streams can divide the workload into multiple threads, which can be executed concurrently.
  • 4
    Parallelising is almost never the answer. If your DB can't single-thread process 1000 inserts without taking 9 hours to do it, you have a problem. 'sod it, parallleize!' is not a good solution to this problem. Given your code, there is no enough detail to just tell you _why_. Ask the DB to tell you why - use `EXPLAIN`, etc. Perhaps you are missing indexes. – rzwitserloot May 02 '23 at 10:43
  • 1
    My first step would be to do a performance analysis. Find out your exact points where things are slow. Is it your data I/O? Is it your processing/calculations? You seem to have started that, but you are only measuring your overall time for a batch of records. Narrow it down further before trying to optimize. Right now, you don't even know if it's your code, the network, or the database. – gmiley May 02 '23 at 10:50
  • 1
    You could separate the database read and write operations. For the read operation you could parallelize the requests and store the result to a thread-safe queue or list; For the write operation, you could make use of batch insert and save 100 or more records to DB one time. That will save much DB write time. – Stone May 02 '23 at 12:20

1 Answers1

0

I changed my method...

  • Divided the list into smaller chunks and process each chunk in a separate thread, allowing me to take advantage of multiple CPU cores and speed up the processing.

  • Used the saveAll() method provided by Spring Data JPA to save a batch of entities in a single database transaction. This reduced the overhead of individual transactions and improved performance.

  • Used HashMaps

     public int saveMonthlyBudgetAccounts(List<MonthlyBudgetAccountsDTO> dto) {
     // create a hashmap of branch codes to branch objects
     Map<String, Branch> branchMap = new HashMap<>();
     for (Branch b : branchRepo.findAll()) {
         branchMap.put(b.getSapBranch().toLowerCase(), b);
     }
    
     // create a hashmap of cost code descriptions to cost code objects
     Map<String, CostCodes> costCodesMap = new HashMap<>();
     for (CostCodes c : costCodesRepo.findAll()) {
         costCodesMap.put(c.getCostCodeDescription().toUpperCase().trim(), c);
     }
    
     // create a hashmap of budget versions to budget version objects
     Map<String, BudgetVersion> budgetVersionMap = new HashMap<>();
     for (BudgetVersion b : budgetVersionRepo.findAll()) {
         budgetVersionMap.put(b.getBudgetVersion().toLowerCase().trim(), b);
     }
    
     int count = 0;
    
     int batchSize = 100;
     List<MonthlyBudgetAccountsDTO> dtoList = new ArrayList<>(dto);
     int numThreads = (int) Math.ceil((double) dtoList.size() / batchSize);
     ExecutorService executor = Executors.newFixedThreadPool(numThreads);
    
     try {
         for (int i = 0; i < dtoList.size(); i += batchSize) {
             int endIndex = Math.min(i + batchSize, dtoList.size());
             List<MonthlyBudgetAccountsDTO> subList = dtoList.subList(i, endIndex);
             executor.execute(() -> {
                 long startTime = System.currentTimeMillis();
                 List<MonthlyBudgetAccounts> entities = new ArrayList<>();
                 for (MonthlyBudgetAccountsDTO acc : subList) {
                     MonthlyBudgetAccounts m = new MonthlyBudgetAccounts();
                     MonthlyBudgetAccountsPK pk = new MonthlyBudgetAccountsPK();
                     String branch = acc.getId().getBranch().toLowerCase();
                     String costCode = acc.getId().getCostCode().toUpperCase();
                     String budgetVersion = acc.getId().getBudgetVersion().toLowerCase();
    
                     // Set the branch
                     Branch branchObj = branchMap.getOrDefault(branch.substring(branch.length() - 4), null);
                     if (branchObj == null) {
                         // Handle case where branch is not found
                         branchObj = branchRepo.findById(900); // Replace with default branch object
                     }
                     pk.setBranch(branchObj);
    
                     // Set the cost code
                     CostCodes costCodeObj = costCodesMap.get(costCode.trim());
                     if (costCodeObj == null) {
                         // Handle case where cost code is not found
                         throw new NoSuchElementException("Cost code not found");
                     }
                     pk.setCostCode(costCodeObj);
    
                     // Set the budget version
                     BudgetVersion budgetVersionObj = budgetVersionMap.get(budgetVersion.trim());
                     if (budgetVersionObj == null) {
                         // Handle case where budget version is not found
                         throw new NoSuchElementException("Budget version not found");
                     }
                     pk.setBudgetVersion(budgetVersionObj);
    
                     // Set other fields
                     pk.setAccount(acc.getId().getAccount());
                     pk.setFinancialYear(acc.getId().getFinancialYear());
    
                     m.setId(pk);
                     m.setTotal(acc.getTotal());
                     m.setMarch(acc.getMarch());
                     m.setApril(acc.getApril());
                     m.setMay(acc.getMay());
                     m.setJune(acc.getJune());
                     m.setJuly(acc.getJuly());
                     m.setAugust(acc.getAugust());
                     m.setSeptember(acc.getSeptember());
                     m.setOctober(acc.getOctober());
                     m.setNovember(acc.getNovember());
                     m.setDecember(acc.getDecember());
                     m.setJanuary(acc.getJanuary());
                     m.setFebruary(acc.getFebruary());
                     m.setInsertDate(acc.getInsertDate().trim());
    
                     entities.add(m);
    
                 }
    
                 monthlyBudgetAccountsRepo.saveAll(entities);
    
             });
             count += batchSize;
         }
     } finally {
         executor.shutdown();
         try {
             executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
         } catch (InterruptedException e) {
             // TODO Auto-generated catch block
             e.printStackTrace();
         }
     }
    
     // Clear the DTO to free up memory
     dto.set(count, null);
     return count;
    

    }