0

We are having an issue with processing huge dataset.(It is not that huge, but still, it is huge for us as we never had to pull more than 100k data before with Spring MVC+Oracle).

The requirement is as such, we need to generate reports every month and mail it to client sales team. The Report is generated as Excel file and this is where we are pulling such huge data.

We are using Spring MVC and Oracle as DB. When I run the query in the DB, it pulls the data effortlessly within 2 seconds, consistently. But only when I call it through SimpleJdbcCall or SP in Spring, I'm getting Java Heap Space error. I increased heap space to 2GB but still no use. My system configuration is :

i3-4160 @ 3.6 GHz(Quad Core?) / 8GB RAM / Windows 7 Pro 64bit.

Please help me where I'm going wrong and how to handle this. Pasted the code below.

FlexiView.java (which extends StoredProcedure)

package com.web.helper;

import java.util.List;
import java.util.Map;

import oracle.jdbc.OracleTypes;

import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.SqlOutParameter;
import org.springframework.jdbc.core.SqlParameter;
import org.springframework.jdbc.object.StoredProcedure;

import com.domain.ViewFormDO;

public class FlexiView extends StoredProcedure {

    public FlexiView(JdbcTemplate jdbcTemplate, final String storedProc, ViewFormDO viewFormDO, int reportId){

        super(jdbcTemplate, storedProc);
        jdbcTemplate.setFetchSize(20000);

        declareParameter(new SqlOutParameter("P_CUR", OracleTypes.CURSOR, new FlexiViewMapper(reportId, viewFormDO)));
        declareParameter(new SqlParameter("P_AGENT_ID", OracleTypes.NUMBER));

        this.compile();
    }

    @SuppressWarnings("unchecked")
    public <T> List<T> executeStoredProc(final Map<String, Object> valueMap) {
        System.out.println(">>"+valueMap.size());
        // execute stored procedure
        Map<String, Object> resultMap = super.execute(valueMap);
        return (List<T>)resultMap.get("P_CUR");
    }

}

CustomMapper

package com.web.helper;

import java.sql.ResultSet;
import java.sql.SQLException;
import java.text.SimpleDateFormat;

import org.springframework.jdbc.core.RowMapper;

import com.domain.ViewFormDO;
import com.domain.ViewFormGridDO;

public class FlexiViewMapper implements RowMapper<Object> {

    int reportId = 0;
    final SimpleDateFormat dtFormat = new SimpleDateFormat("dd-MM-yyyy");
    ViewFormDO viewFormDO;

    public FlexiViewMapper(int reportId, ViewFormDO viewFormDO){
        this.reportId = reportId;
        this.viewFormDO = viewFormDO;
    }

    @Override
    public Object mapRow(ResultSet rs, int rowNum) throws SQLException {
        ViewFormGridDO flexiActiveGridDet = new ViewFormGridDO();
        if (reportId == 1001 || reportId == 1002) {
            // Get data from DB and assign it in ViewFormGridDO Bean
        }
        return flexiActiveGridDet;
    }

}

And calling this in DAO, like this

@Override
public List<ViewFormGridDO> getFlexiGuideEnqGridDet1(final ViewFormDO viewFormDO)
        throws NsureException {
    List<ViewFormGridDO> flexiGuideGridDetails = null;
    final int reportId = viewFormDO.getFlexiTypeId() == GlobalNames.SCHEME_XXX ? 1002 : viewFormDO.getReportId();

    LinkedHashMap<String, Object> valueMap = new LinkedHashMap<String, Object>();
    valueMap.put("P_AGENT_ID", viewFormDO.getAgentId());

    FlexiView flexiView = new FlexiView(jdbcTemplate, "PKG_XXXXX.prGetXXXXX", viewFormDO, reportId); 
    flexiGridDetails = flexiView.executeStoredProc(valueMap);

    return flexiGridDetails;
}

FINAL UPDATE:(a.k.a Solution?)

I ended up doing this. My Bean ViewFormDO had 100+ fields but I need only 20-30 fields. Due to this, the list of beans itself crossed 500MB mark easily. To avoid this, I've changed the row mapper to return JSONObject instead of ViewFormDO in FlexiViewMapper.

Still, generating excel with 7 lakhs records proved to be generating Heap Space error and for this, capped the limit at 5 lakh. Changed the code logic to eliminate unnecessary values and that brought down the data from 72 lakhs to 2 lakhs. Yes, a lot of useless combinations were generated. So after all these optimizations, now, even if I select ALL filters to generate report, the number of data is not crossing 5 Lakh mark and hoping that it will stay this way. We had given all possible scenarios!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Rajkumar
  • 440
  • 7
  • 31
  • How large is one entry? You haven't shown your DTO, so i can't do it for you, but if you now calculate the sizes of your fields etc., you are able to calculate how much memory you need. – dunni Sep 19 '17 at 15:01
  • 1
    but also if you can calculate how much memory you need... you are only postponing the issue if your data will increase. Maybe it's better to use some pagination and build the report page by page – Angelo Immediata Sep 19 '17 at 15:03
  • @dunni DTO consists of 400 fields and we will use 50% of those fields at any point of time. – Rajkumar Sep 20 '17 at 05:38
  • @AngeloImmediata If I remember correctly, even if I offset data in Oracle, it will still fetch all data and then limit rows. So I don't think there will be any benefit that way. – Rajkumar Sep 20 '17 at 05:39
  • 1
    I don't know your Oracle version... but you may consider this http://docs.oracle.com/javadb/10.6.2.1/ref/rrefsqljoffsetfetch.html – Angelo Immediata Sep 20 '17 at 06:47
  • @AngeloImmediata Will try that and update it here. – Rajkumar Sep 20 '17 at 06:48
  • @AngeloImmediata I have used offset as 100,000 and for every 100k records, it takes 12 seconds to process and for a total of 800k records, it takes ~ 2 minutes. Almost 80% of time is taken only for reading the resultset and only 20% of time is taken to write it in excel. – Rajkumar Sep 20 '17 at 11:59
  • So the OOM issue is solved. The problem now is that the process is too slow. Am I right? It makes a sense to spend almost 80% of time in resultset reading (jdbc is a pretty slow protocol) but you should verify on the server machine... I guess performances are better – Angelo Immediata Sep 20 '17 at 13:15
  • @AngeloImmediata Hi, Now the problem is when I test the function from Junit, it's working fine and the total processing is done within 50 seconds. But if I call the same function from the application, I'm getting heap space error. Any idea the reason behind this ordeal? – Rajkumar Sep 27 '17 at 05:48

1 Answers1

2

Try setting the fetch size to a smaller value. The default value is 10. Values larger than 100 rarely provide significant benefits. You are setting it to 20,000. The amount of memory allocated by the Oracle Database JDBC drivers depends in part on the fetch size; large fetch size, more memory. Setting the fetch size to a much smaller value is very likely to solve your problem. I'd suggest 100 as a start. See this answer for more details.

Douglas Surber
  • 622
  • 4
  • 9
  • I started with 100. And moved to 200, and so on till 20000. I'm anyways getting Java Heap Space error. – Rajkumar Sep 20 '17 at 05:37
  • Where does the ResultSet passed to mapRow get closed? – Douglas Surber Sep 20 '17 at 15:34
  • JDBCTemplate will close the Resultset and I'm sure all these boilerplate codes are handled by JDBCTemplate itself. – Rajkumar Sep 21 '17 at 04:53
  • I know nothing about `JDBCTemplate`, so you may be right. But, ... The `ResultSet` in question is not a normal `ResultSet`, e.g. it is not one returned by calling `executeQuery`. It is the result of calling `getCursor`. It may be the case that `JDBCTemplate` does not close that `ResultSet`. Since `setFetchSize` isn't affecting the problem (you should set it to 100 not 20,000) then something else is. Leaving a `ResultSet` open could cause this problem. – Douglas Surber Sep 21 '17 at 14:49
  • 1
    On further consideration I think it very likely that your app is leaving the cursor `ResultSets` open. Cursors are Oracle specific. Spring doesn't know anything about them so far as I know. So it seems very likely that Spring has no code to close them as that would be Oracle specific code. – Douglas Surber Sep 22 '17 at 15:11