I'm building character based autocomplete functionality for a mobile app and am testing my implementation by predicting characters using a nietzsche data set.
The data set I'm using is:
http://evolve.drawcast.com/nietzsche_train.txt
http://evolve.drawcast.com/nietzsche_test.txt
I'm seeing test set next character prediction performance of 57.6% using a naive implementation (basically a tree of frequency that steps back from the end of the string).
What I'm wondering is...can I achieve better with reasonable effort?
I'm willing to go the RNN/LSTM route, but looking at the following I see character prediction performance of roughly 56% on the same (or similar) data set.
http://curiousily.com/data-science/2017/05/23/tensorflow-for-hackers-part-5.html
Unfortunately, I can't find any other explicit text sequence data character prediction results.
I'm ready to be inspired to take on RNNs (or something else) if I can see they perform much better than what I have. Anyone have an implementation they can quickly test on the above data or know of something I can compare to?