I want to load the flat text file passed in as 'TMFlatFile' (which is the .tsv file format to use in MALLET) into into the fileReader variable. I have created the method, RunTopicModelling() and am having a problem with the try/except block. I have created my File and FileInputStream objects, but dont know how to load it correctly into fileReader?
I have an error that "The method read(CharBuffer) in the type InputStreamReader is not applicable for the arguments (int)".
public class TopicModelling {
private void StartTopicModellingProcess(String filePath) {
JSONIOHelper jsonIO = new JSONIOHelper();
jsonIO.LoadJSON(filePath);
ConcurrentHashMap<String, String> lemmas = jsonIO.GetDocumentsFromJSONStructure();
SaveLemmaDataToFile("topicdata.txt" ,lemmas);
}
private void SaveLemmaDataToFile(String TMFlatFile, ConcurrentHashMap<String, String> lemmas) {
for (Entry<String, String> entry : lemmas.entrySet()) {
try (FileWriter writer = new FileWriter(TMFlatFile)) {
;
writer.write(entry.getKey() + "\ten\t" + entry.getValue() + "\r\n");
} catch (Exception e)
{
System.out.println("Saving to flat text file failed...");
}
}
}
private void RunTopicModelling(String TMFlatFile, int numTopics, int numThreads, int numIterations) {
ArrayList<Pipe> pipeList = new ArrayList <Pipe>();
// Pipes: tokenise, map to features
pipeList.add(new CharSequence2TokenSequence (Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")));
pipeList.add(new TokenSequence2FeatureSequence());
InstanceList instances = new InstanceList (new SerialPipes(pipeList));
InputStreamReader fileReader = null;
//loads the file passed in via the TMFlatFile variable into the fileReader variable - this block I have a problem with
try {
File inFile = new File(TMFlatFile);
FileInputStream fis = new FileInputStream(inFile);
int line;
while ((line = fis.read()) != -1) {
}
fileReader.read(line);
}
fis.close();
}catch(
Exception e)
{
System.out.println("File Load Failed");
System.exit(1);
}
\\ // linking data to the pipeline
instances.addThruPipe(new CsvIterator(fileReader,Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),3,2,1));
}
Can someone tell me what is the correct way to do this?