Predicting Characters From Nietzsche Data Set

Question

I'm building character based autocomplete functionality for a mobile app and am testing my implementation by predicting characters using a nietzsche data set.

The data set I'm using is:

http://evolve.drawcast.com/nietzsche_train.txt
http://evolve.drawcast.com/nietzsche_test.txt

I'm seeing test set next character prediction performance of 57.6% using a naive implementation (basically a tree of frequency that steps back from the end of the string).

What I'm wondering is...can I achieve better with reasonable effort?

I'm willing to go the RNN/LSTM route, but looking at the following I see character prediction performance of roughly 56% on the same (or similar) data set.

http://curiousily.com/data-science/2017/05/23/tensorflow-for-hackers-part-5.html

Unfortunately, I can't find any other explicit text sequence data character prediction results.

I'm ready to be inspired to take on RNNs (or something else) if I can see they perform much better than what I have. Anyone have an implementation they can quickly test on the above data or know of something I can compare to?

Why would you want to predict a single character? More useful would be to predict the whole word. — MrSmith42, Nov 15 '17 at 10:10
You should add some details about how you measure the prediction performance. Do you already try to predict the 2nd character after the first is typed? Add some details about what the use case of the autocomplete functionality will be. — MrSmith42, Nov 15 '17 at 10:13
The character predictions will be used to invisibly adjust the bounding rectangles of a custom keyboard's keys (making more likely characters more likely to be tapped on). In my stated prediction performance, I assume that the 8 previous characters are available to predict the next one (as such the first 8 characters of the test set are not predicted). I've updated the data sets to remove newlines to keep things consistent with my test results. — OnesAndZeroes, Nov 15 '17 at 13:50

Predicting Characters From Nietzsche Data Set

0 Answers0