22

Using Spring and Hibernate, I want to write to one MySQL master database, and read from one more more replicated slaves in cloud-based Java webapp.

I can't find a solution that is transparent to the application code. I don't really want to have to change my DAOs to manage different SessionFactories, as that seems really messy and couples the code with a specific server architecture.

Is there any way of telling Hibernate to automatically route CREATE/UPDATE queries to one datasource, and SELECT to another? I don't want to do any sharding or anything based on object type - just route different types of queries to different datasources.

Deejay
  • 340
  • 1
  • 3
  • 9
  • Have you got both UPDATE/CREATE and SELECT queries in the same DAO/service? One option could be to split those out (making setting their data sources that much easier) – Martijn Verburg Dec 08 '10 at 10:06
  • Hmm, that sounds like the most sensible option I've seen so far. Think I might give that a go if there's not a more 'transparent' option. Thanks! – Deejay Dec 08 '10 at 10:18
  • How about using MySQL proxy to split the read and write operations? Has anyone tried this? – nylund Dec 01 '12 at 16:09

5 Answers5

24

An example can be found here: https://github.com/afedulov/routing-data-source.

enter image description here

Spring provides a variation of DataSource, called AbstractRoutingDatasource. It can be used in place of standard DataSource implementations and enables a mechanism to determine which concrete DataSource to use for each operation at runtime. All you need to do is to extend it and to provide an implementation of an abstract determineCurrentLookupKey method. This is the place to implement your custom logic to determine the concrete DataSource. Returned Object serves as a lookup key. It is typically a String or en Enum, used as a qualifier in Spring configuration (details will follow).

package website.fedulov.routing.RoutingDataSource

import org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource;

public class RoutingDataSource extends AbstractRoutingDataSource {
    @Override
    protected Object determineCurrentLookupKey() {
        return DbContextHolder.getDbType();
    }
}

You might be wondering what is that DbContextHolder object and how does it know which DataSource identifier to return? Keep in mind that determineCurrentLookupKey method will be called whenever TransactionsManager requests a connection. It is important to remember that each transaction is "associated" with a separate thread. More precisely, TransactionsManager binds Connection to the current thread. Therefore in order to dispatch different transactions to different target DataSources we have to make sure that every thread can reliably identify which DataSource is destined for it to be used. This makes it natural to utilize ThreadLocal variables for binding specific DataSource to a Thread and hence to a Transaction. This is how it is done:

public enum DbType {
   MASTER,
   REPLICA1,
}

public class DbContextHolder {

   private static final ThreadLocal<DbType> contextHolder = new ThreadLocal<DbType>();

   public static void setDbType(DbType dbType) {
       if(dbType == null){
           throw new NullPointerException();
       }
      contextHolder.set(dbType);
   }

   public static DbType getDbType() {
      return (DbType) contextHolder.get();
   }

   public static void clearDbType() {
      contextHolder.remove();
   }
}

As you see, you can also use an enum as the key and Spring will take care of resolving it correctly based on the name. Associated DataSource configuration and keys might look like this:

  ....
<bean id="dataSource" class="website.fedulov.routing.RoutingDataSource">
 <property name="targetDataSources">
   <map key-type="com.sabienzia.routing.DbType">
     <entry key="MASTER" value-ref="dataSourceMaster"/>
     <entry key="REPLICA1" value-ref="dataSourceReplica"/>
   </map>
 </property>
 <property name="defaultTargetDataSource" ref="dataSourceMaster"/>
</bean>

<bean id="dataSourceMaster" class="org.apache.commons.dbcp.BasicDataSource">
  <property name="driverClassName" value="com.mysql.jdbc.Driver"/>
  <property name="url" value="${db.master.url}"/>
  <property name="username" value="${db.username}"/>
  <property name="password" value="${db.password}"/>
</bean>
<bean id="dataSourceReplica" class="org.apache.commons.dbcp.BasicDataSource">
  <property name="driverClassName" value="com.mysql.jdbc.Driver"/>
  <property name="url" value="${db.replica.url}"/>
  <property name="username" value="${db.username}"/>
  <property name="password" value="${db.password}"/>
</bean>

At this point you might find yourself doing something like this:

@Service
public class BookService {

  private final BookRepository bookRepository;
  private final Mapper               mapper;

  @Inject
  public BookService(BookRepository bookRepository, Mapper mapper) {
    this.bookRepository = bookRepository;
    this.mapper = mapper;
  }

  @Transactional(readOnly = true)
  public Page<BookDTO> getBooks(Pageable p) {
    DbContextHolder.setDbType(DbType.REPLICA1);   // <----- set ThreadLocal DataSource lookup key
                                                  // all connection from here will go to REPLICA1
    Page<Book> booksPage = callActionRepo.findAll(p);
    List<BookDTO> pContent = CollectionMapper.map(mapper, callActionsPage.getContent(), BookDTO.class);
    DbContextHolder.clearDbType();               // <----- clear ThreadLocal setting
    return new PageImpl<BookDTO>(pContent, p, callActionsPage.getTotalElements());
  }

  ...//other methods

Now we can control which DataSource will be used and forward requests as we please. Looks good!

...Or does it? First of all, those static method calls to a magical DbContextHolder really stick out. They look like they do not belong the business logic. And they don't. Not only do they not communicate the purpose, but they seem fragile and error-prone (how about forgetting to clean the dbType). And what if an exception is thrown between the setDbType and cleanDbType? We cannot just ignore it. We need to be absolutely sure that we reset the dbType, otherwise Thread returned to the ThreadPool might be in a "broken" state, trying to write to a replica in the next call. So we need this:

  @Transactional(readOnly = true)
  public Page<BookDTO> getBooks(Pageable p) {
    try{
      DbContextHolder.setDbType(DbType.REPLICA1);   // <----- set ThreadLocal DataSource lookup key
                                                    // all connection from here will go to REPLICA1
      Page<Book> booksPage = callActionRepo.findAll(p);
      List<BookDTO> pContent = CollectionMapper.map(mapper, callActionsPage.getContent(), BookDTO.class);
       DbContextHolder.clearDbType();               // <----- clear ThreadLocal setting
    } catch (Exception e){
      throw new RuntimeException(e);
    } finally {
       DbContextHolder.clearDbType();               // <----- make sure ThreadLocal setting is cleared         
    }
    return new PageImpl<BookDTO>(pContent, p, callActionsPage.getTotalElements());
  }

Yikes >_< ! This definitely does not look like something I would like to put into every read only method. Can we do better? Of course! This pattern of "do something at the beginning of a method, then do something at the end" should ring a bell. Aspects to the rescue!

Unfortunately this post has already gotten too long to cover the topic of custom aspects. You can follow up on the details of using aspects using this link.

Alex Fedulov
  • 1,442
  • 17
  • 26
  • In case if you are looking to configure RoutingDataSource via Java instead of XML. Go through : http://www.baeldung.com/spring-abstract-routing-data-source – NameNotFoundException Apr 26 '18 at 08:20
  • How would this work when the slave takes over the master role? Would the application have to keep checking the datasources to determine which one held the master role? – BX21 Sep 27 '18 at 09:15
7

I don't think that deciding that SELECTs should go to one DB (one slave) and CREATE/UPDATES should go to a different one (master) is a very good decision. The reasons are:

  • replication is not instantaneous, so you could CREATE something in the master DB and, as part of the same operation, SELECT it from the slave and notice that the data hasn't yet reached the slave.
  • if one of the slaves is down, you shouldn't be prevented from writing data in the master, because as soon as the slave is back up, its state will be synchronized with master. In your case though, your write operations are dependent on both master and slave.
  • How would you then define transactionality if you're in fact using 2 dbs?

I would advise using the master DB for all the WRITE flows, with all the instructions they might require (whether they are SELECTs, UPDATE or INSERTS). Then, the application dealing with the read-only flows can read from the slave DB.

I'd also advise having separate DAOs, each with its own methods, so that you'll have a clear distinction between read-only flows and write/update flows.

octav
  • 1,181
  • 7
  • 6
  • Thanks for your thoughts. 1) Won't the Hibernate cache handle this situation? When we get to horizontal scaling, the plan is to use EHCache with Terracotta. 2) Not sure I understand why the write operations are dependent on the availaibility of the slave? 3) If something gets written and returns no errors, then that's good enough for me. – Deejay Dec 08 '10 at 11:59
  • 1
    2) I was thinking that for instance, an UPDATE operation assumes an initial SELECT then an UPDATE on the operation that was retrieved from the SELECT, so you might end-up doing the initial SELECT on the slave and the UPDATE on the master, which may lead to difficulties defining transactions and to issues when replication is still in progress (slave doesn't contain all data that master has). – octav Dec 08 '10 at 12:06
  • 1) MySQL replication is a process independent of ORM tools, so I don't think that may help here.. – octav Dec 08 '10 at 12:08
  • 1) If Hibernate is doing the write, then when a SELECT is next requested if the cache is still alive then it won't actually hit the database, it will use the value it wrote that it stored in the cache. Use of EHCache and Terracotta should ensure that this will be consistent across appserver instances. – Deejay Dec 08 '10 at 12:36
3

You could create 2 session factories and hava a BaseDao wrapping the 2 factories(or the 2 hibernateTemplates if you use them) and use the get methods with on factory and the saveOrUpdate methods with the other

Liviu T.
  • 23,584
  • 10
  • 62
  • 58
  • This is perfect solution, but it causes illegal attempt to associate object to two open session. see http://stackoverflow.com/questions/19403066/two-hibernate-transaction-manager-illegal-attempt-to-associate-proxy-with-two-o, kindly help me – Vinit Prajapati Oct 16 '13 at 13:50
2

Try this way : https://github.com/kwon37xi/replication-datasource

It works nicely and very easy to implement without any extra annotation or code. It requires only @Transactional(readOnly=true|false).

I have been using this solution with Hibernate(JPA),Spring JDBC Template, iBatis.

KwonNam
  • 686
  • 1
  • 8
  • 19
  • Can you answer http://stackoverflow.com/questions/43247593/spring-boot-read-write-split. I am using the same but not working – Ankit Bansal Apr 06 '17 at 06:41
1

You can use DDAL to implement writting master database and reading slave database in a DefaultDDRDataSource without modifying your Daos, and what's more, DDAL provided loading balance for mulit-slave databases. It doesn't rely on spring or hibernate. There is a demo project to show how to use it: https://github.com/hellojavaer/ddal-demos and the demo1 is just what you described scene.

allen
  • 1,574
  • 1
  • 10
  • 6