2

Suppose we have an entity "Something" and this something has one to many (factor of millions) relationship to some "Data".

Something 1 -> * Data
Data 1 -> 1 Something

Now if we want to add some data objects we should do this:

Something.getDataList().add(Data)

This will actually pull all data objects from database which is not optimal imho.

However if i remove relationship from Something, and leave it in Data I'll be able to add and retrieve exactly those objects that I ask for using DAO:

Something
Data 1 -> 1 Something

Now the data access interface will look like this:

Something.addData(Data) // will use DataDAO to save object

or

Something.addData(List<Data>) // will use same DataDAO batch insert

I need some guidance on this, maybe I lack some knowledge in JPA, and there is no need for this? Also I'm not sure as entities are not natural this way as data is provided by their methods but its not actually contained in entity, (if this is right then I should remove every one to many relationship if there is a performance critical operation dealing with that particular entity, which is also unnatural).

In my particular case I have lot of REST consumers that periodically gonna update database. I'm using ObjectDB, JPA... But question is more abstract here.

Abhinav Sarkar
  • 23,534
  • 11
  • 81
  • 97
vach
  • 10,571
  • 12
  • 68
  • 106
  • can use a good old insert statement if you want. – Scary Wombat Mar 12 '14 at 07:45
  • 1
    Not really: mixing native SQL inserts with JPA can cause a lot of problems since EntityManager wouldn't know of the new records until a refresh/merge is performed, cache would be invalid etc. – Tasos P. Mar 12 '14 at 07:47
  • That's why I don't want to use SQL statements, But with that second approach EntityManager is used in implementation so it is aware of everything... Am I right on this? Does anyone done something similar? – vach Mar 12 '14 at 07:51
  • with a factor of millions as child objects, you ain't likely to want to cache these any way. – Scary Wombat Mar 12 '14 at 07:58
  • I think query cache will be enough for me. I doubt to make this change as it is like changing architecture. My conclusion is that its better to use second approach. Also in this case 2 entities with many-to-many relationship to each other can be replaced with one-to-many relationship... – vach Mar 12 '14 at 08:02
  • 1
    Using NamedQueries in DataDAO with proper query caching hints (depending on your JPA provider) can help you maintain a usable cache containing only the results you actually need. – Tasos P. Mar 12 '14 at 08:04
  • Thanks I'll check them out, never used NamedQueries. – vach Mar 12 '14 at 08:07
  • @Cascader using a native query for that case is IMHO a good solution, because the loaded `Something` has probably not loaded that mln. of `Data`. And if that is the case, there are options, like clearing the PersistenceContext, or calling an `refresh()`. – V G Mar 12 '14 at 09:03
  • @Java1 regarding "with a factor of millions as child objects, you ain't likely to want to cache these any way." let me disagree. My application will have multiple consumers that will query database for the same objects being anaware of each other (working in the same VM), thus it really makes sence to cache them, like having HashMap with weak references and size of few millions will result really good memory management, as more than one consumer will reuse the same entity (entities are immutable)... – vach Mar 29 '14 at 13:04

1 Answers1

2

I believe that having the something.getDataList() around is a clicking bomb if there are milions of Data records related to Something. Just as you said, calling something.getDataList().add(data) would fetch the whole data set from DB in order to perform a single insert. Also, anyone could be tempted to use something.getDataList().size to get the number of records, resulting in the same overhead.

My suggestion is that you use the DataDAO for such operations (i.e. adding or counting) like:

void addData(Something something, Data data){
    //Something similar can be used for batch insert
    data.setSomething(something);
    em.persist(data);
}

List<Data> findData(Something something, MyFilter filter){
    // use a NamedQuery with parameters populated by something and your filters
    return em.getResultList();
}

Long countData(Something something){
    // use a 'SELECT COUNT(*) FROM Data d WHERE d.something = :something' NamedQuery 
    return em.getSingleResult;
}
Tasos P.
  • 3,994
  • 2
  • 21
  • 41
  • 1
    so you agree that my Something entity should not have List field and rather Data should have a reference to "Something"? – vach Mar 12 '14 at 08:05