So, I'm implementing a Markov random text generator in Java, and I've gotten as far as plucking out the n-grams in the text file, but now I'm struggling to write a class that gives the number of occurrences of the n-grams in the text (and eventually the probability).
This is the code I have so far. It's a little messy but this is a rough draft. //here's the main file, where I parse the text and create a new n-gram object with the given text
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public class Markov {
public static String readCorpusToString(File fileName) {
String corpus = " ";
try {
corpus = new String(Files.readAllBytes(Paths.get(String.valueOf(fileName))));
}
catch (IOException e) {
e.printStackTrace();
}
return corpus;
}
public static void main(String[] args) {
File text = new File(args[0]);
String corpus = readCorpusToString(text);
//System.out.println(corpus);
Ngram test = new Ngram(3, corpus);
for ( int i = 0; i <= corpus.length(); i++) {
System.out.println(test.next());
}
}
}
and here's the class for my n-gram object
import java.util.Iterator;
public class Ngram implements Iterator<String> {
String[] words;
int pos = 0, n;
public Ngram(int n, String str) {
this.n = n;
words = str.split(" ");
}
public boolean hasNext() {
return pos < words.length - n + 1;
}
public String next() {
StringBuilder sb = new StringBuilder();
for (int i = pos; i < pos + n; i++) {
sb.append((i > pos ? " " : "") + words[i]);
}
pos++;
return sb.toString();
}
public void remove() {
throw new UnsupportedOperationException();
}
}