After a lot of reading and researching, realized that I was tackling the problem in a wrong way. For Handwriting recognition it's tough to segment characters and then recognize them.
As the paradox goes
A letter can't be segmented before having recognized and can't be recognized before having segmented.
So the correct way is to treat the problem as "Supervised Sequence Labelling".
What distinguishes such problems from the traditional
framework of supervised pattern classication is that the individual data points
cannot be assumed to be independent. Instead, both the inputs and the labels
form strongly correlated sequences.
I would suggest the paper here using Multi Dimensional RNN and CTC.