0

I am using Spring + Hibernate on my JavaEE project.

In this project the user can upload an XLS file which I should import to my database. Before importing I have to validate this file checking its integrity with the other entities on my database. So I have more or less the following:

// The importer
@Component("importer")
public class Importer {

    @Autowired
    FirstDAO firstDao;

    @Autowired
    SecondDAO secondDao;

    // Read the file and open it (65.000 lines for example)

    public void validate() {

        foreach line in the file {
            firstDAO.has(line[col1]);
            secondDao.has(line[col2]);
        }

        // It stores the valid objects in a List and persist them at the end 

    }
}

// The DAO
@Repository
public class FirstDao {

    @PersistenceContext
    protected EntityManager entityManager;

    @Transactional(propagation = Propagation.NOT_SUPPORTED)
    public boolean has(String name) {

         List<Object> result = entityManager.createQuery( from FIRST_TABLE where name = :name)
         .setParameter("name", name)
         .getResultList();

         if (result.size > 0) return true;
         else return false;
    }

}

// The PersistenceContext/Hibernate configuration

<!-- Data Source -->
<jee:jndi-lookup id="myDS" jndi-name="jdbc/my-DS" cache="true" proxy-interface="javax.sql.DataSource" />

<!-- Entity Manager -->
<bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
    <property value="classpath:META-INF/my_persistence.xml" name="persistenceXmlLocation"/>
    <property name="dataSource" ref="myDS"/>
    <property name="persistenceUnitName" value="myPersistenceUnit" />
    <!--
    <property name="loadTimeWeaver">
        <bean class="org.springframework.instrument.classloading.InstrumentationLoadTimeWeaver"/>
    </property>
    -->
    <property name="jpaVendorAdapter">
        <bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter">
            <property name="database" value="ORACLE" />
            <property name="showSql" value="false" />
        </bean>
    </property>
</bean>

After logging the application I have noticed:

  • For each query (has method on my DAO) a connection is opened and closed with my Database.
  • The memory on the server is being flooded (probably memory leak).
  • After a lot of opening and closing connections I have a connection reset from the Database. Don't know why. And if I still keep requesting coonections, the Datasource is suspended.

I have read somethings about entityManager but I still don't know if I am doing it right, so:

  • Is it right to execute the validation in a for loop that way? (One connection for each item, meaning 130.000 connections open and closed in a 65000 lines file)
  • I have read about Stateless Persistence Context for the entityManager. I suspect the memory leak may be there. Maybe Hibernate is kepting a lot of objects in the PersistenceContext. How do I tell Entity Manager to not cache those guys when validating?

Thanks in advance.

João Menighin
  • 3,083
  • 6
  • 38
  • 80
  • Your question sounds like a list of questions. But about conecting to the DB for each line of the file, that will drag your system's performance down by a lot. Can't you iterate through the entire file, load it into memory, and then run a single select statement to retrieve the db data you need to compare with the loaded file? Load all the data into memory and then validade all at once. You might need to fix your memory leak issue first. – wdoering Oct 02 '15 at 20:47

1 Answers1

0

First of all, you really shouldn't do that line by line unless you have a very very good reason. Even if the data size is bigger than your memory you should do that 1000 lines at a time or something like that but definitely not one by one. Because one of the most important optimization for database usage is reducing number of database hit.

Secondly you should not retrieve the data just to check if it is exist. You should use a basic "select count" query. By that way you will get rid of all stuff like consuming IO to read data and retrieving that data through network to your server and spending memory to just get the number of object in that list.

If you will use my first advice and check the existing of records not one at a time but 1000s at a time you can select just the names instead of all rows.

Btw as far as I can see you are using a datasource if that is properly configured like number of max connection etc. you shouldn't worry about number of database connection.

cool
  • 1,746
  • 1
  • 14
  • 15