The state
has a very specific meaning and purpose. This isn't a question of "advisable" or not, there's a right and wrong answer here, and it depends on your data.
Consider each timestep in your sequence of data. At the first time step your state
should be initialized to all zeros. This value has a specific meaning, it tells the network that this is the beginning of your sequence.
At each time step the RNN is computing a new state. The MultiRNNCell implementation in tensorflow is hiding this from you, but internally in that function a new hidden state is computed at each time step and passed forward.
The value of state
at the 2nd time step is the output of the state
at the 1st time step, and so on and so forth.
So the answer to your question is yes only if the next batch is continuing in time from the previous batch. Let me explain this with a couple of examples where you do, and don't perform this operation respectively.
Example 1: let's say you are training a character RNN, a common tutorial example where your input is each character in the works of Shakespear. There are millions of characters in this sequence. You can't train on a sequence that long. So you break your sequence into segments of 100 (if you don't know why to do otherwise limit your sequences to roughly 100 time steps). In this example, each training step is a sequence of 100 characters, and is a continuation of the last 100 characters. So you must carry the state forward to the next training step.
Example 2: where this isn't use would be in training an RNN to recognize MNIST handwritten digits. In this case you split your image into 28 rows of 28 pixels and each training has only 28 time steps, one per row in the image. In this case each training iteration starts at the beginning of the sequence for that image and trains fully until the end of the sequence for that image. You would not carry the hidden state forward in this case, your hidden state must start with zero's to tell the system that this is the beginning of a new image sequence, not the continuation of the last image you trained on.
I hope those two examples illustrate the important difference there. Know that if you have sequence lengths that are very long (say over ~100 timesteps) you need to break them up and think through the process of carrying forward the state appropriately. You can't effectively train on infinitely long sequence lengths. If your sequence lengths are under this rough threshold then you won't worry about this detail and always initialize your state to zero.
Also know that even though you only train on say 100 timesteps at a time the RNN can still be expected to learn patterns that operate over longer sequences, Karpathy's fabulous paper/blog on "The unreasonable effectiveness of RNNs" demonstrates this beautifully. Those character level RNNs can keep track of important details like whether a quote is open or not over many hundreds of characters, far more than were ever trained on in one batch, specifically because the hidden state was carried forward in the appropriate manner.