I am using Spring Data with EclipseLink JPA to do server side pagination on a database result set. I have everything working and I get the expected paged results, but I noticed performance suffering on large data sets (several million rows). It is taking about 5 minutes to return a page of 20 results. Perhaps this is to be expected, but what concerned me was the query output.
My log output:
SELECT COUNT(filename) FROM document
SELECT filename, datecaptured, din, docdate, docid, doctype, drawer, foldernumber, format, pagenumber, tempfilename, userid FROM document ORDER BY din ASC
I would understand that in order to page, Spring would need to know the max row count, so the first query makes sense.
The second query is pulling the entire database when I specifically only asked for 20 results with a 0 offset (page).
Does Spring/EclipseLink/JPA in fact grab the entire data set and then only return the subset paged request?
If that is the case, how should I modify my repository class to be more efficient?
My test case:
@Test
public void getPagedDocumentsTest() throws IOException {
Page<Document> requestedPage = documentRepository.findAll(new PageRequest(0, 20, Sort.Direction.ASC, "din"));
Assert.assertNotNull("Page is null", requestedPage);
Assert.assertNotNull("Page is empty", requestedPage.getContent());
List<Document> documents = requestedPage.getContent();
LOG.info("{}", documents);
LOG.info("{}", documents.size());
}
My repository class:
import org.springframework.data.jpa.repository.JpaSpecificationExecutor;
import org.springframework.data.repository.PagingAndSortingRepository;
import org.springframework.stereotype.Repository;
import com.example.data.model.Document;
@Repository
public interface DocumentRepository extends PagingAndSortingRepository<Document, String> {
}
Edit - per @Chris's suggestion
Tried adding the platform to my properties, but it didn't make a difference:
eclipselink.weaving=static
eclipselink.allow-zero-id=true
eclipselink.target-database=SQLServer
eclipselink.logging.level=FINE
Also tried adding it to my configuration (I'm using Java Config):
@Bean
public LocalContainerEntityManagerFactoryBean entityManager() {
LocalContainerEntityManagerFactoryBean factory = new LocalContainerEntityManagerFactoryBean();
factory.setPersistenceUnitName("ExampleUnit");
factory.setPackagesToScan("com.example.data.model");
EclipseLinkJpaVendorAdapter eclipseLinkVendorAdapter = new EclipseLinkJpaVendorAdapter();
eclipseLinkVendorAdapter.setDatabase(Database.SQL_SERVER);
eclipseLinkVendorAdapter.setDatabasePlatform("SQLServer");
factory.setJpaVendorAdapter(eclipseLinkVendorAdapter);
factory.setDataSource(dataSource());
factory.setJpaProperties(jpaProperties());
factory.setLoadTimeWeaver(new InstrumentationLoadTimeWeaver());
return factory;
}
Looks like the platform is set correctly.
[EL Config]: connection: 2015-08-06 12:04:05.691--ServerSession(686533955)--Connection(1896042043)--Thread(Thread[main,5,main])--connecting(DatabaseLogin(
platform=>SQLServerPlatform
user name=> ""
connector=>JNDIConnector datasource name=>null
))
But neither helped. The SQL query output remained the same as well.
Edit
Found a related question with a similar answer from @Chris: