I am working on a project in R that calls fasttext from the command line, and I am not sure how to load the output that fasttext gives me as a dataframe
> data.train<-data.frame(index=c(rep("__label__1",3),rep("__label__2",3)),country=c("ENGLAND","BRITAIN","UNITED KINDOM","USA","AMERICA","UNITED STATES"))
> data.train
index country
1 __label__1 ENGLAND
2 __label__1 BRITAIN
3 __label__1 UNITED KINDOM
4 __label__2 USA
5 __label__2 AMERICA
6 __label__2 UNITED STATES
> data.test<-c("EGLND","MURICA")
> data.test
[1] "EGLND" "MURICA"
> write.table(data.train,"data.train.txt",sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE)
>
> write.table(data.test,"data.test.txt",sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE)
>
> system("fasttext supervised -input data.train.txt -output model_data")
Read 0M words
Number of words: 8
Number of labels: 2
Progress: 0.0% words/sec/thread: 103000 lr: 0.100000 loss: 0.672343 eta: -596523h-14m Progress: 100.0% words/sec/thread: 103000 lr: 0.000000 loss: 0.672343 eta: 0h0m
Saving model file.
> system("fasttext predict-prob model_data.bin data.test.txt 2")
__label__1 0.5 __label__2 0.498047
__label__1 0.5 __label__2 0.498047
> res<-system("fasttext predict-prob model_data.bin data.test.txt 2", intern=TRUE)
> res
[1] "__label__1 0.5 __label__2 0.498047" "__label__1 0.5 __label__2 0.498047"
The original system call simply prints the fasttext output to the console which is the problem, however as per the comments intern=TRUE allowed me to save this to the variable res, but now the problem is that the variables is just a vector of strings where what I actually require is a data frame of probabilities for each label like this:
> want
__label__1 __label__2
1 0.5 0.49807
2 0.5 0.49807
This question Fasttext how to load a .csv column into model.predict answers something similar but for python and I need to do this in R.