0

I'm new to programming and R. I'm trying to use the wordfish function in the Austin package. I created a term document matrix from a corpus but cannot successfully use the wordfish command:

    library(tm)
    library(austin)
    text.corpus.format<-VCorpus(DirSource("MyDirectory"))

#create Word Frequency Matrix
    wordfreqmatrix<-TermDocumentMatrix(text.corpus.format)
    wcdata<-as.matrix(wordfreqmatrix) # CONVERT WORD COUNT MATRIX FOR USE WITH WORDFISH
    wcdata<-t(wcdata) # TRANSPOSE TERM DOC MATRIX
    as.matrix(as.data.frame(wcdata)) # ASSIGN DOC TITLES TO MATRIX
    rownames(wcdata)<-lapply(text.corpus.format,Author)

#problematic command results: 
wordfish(input=wcdata,dir=c(221,223))
    Error in wordfish(input = wcdata, dir = c(221, 223)) : 
    unused argument (input = wcdata)

The correct usage for the wordfish function is wordfish(wfm,dir=c(1,10)). I thought I defined wcdata as a word frequency matrix, but I must have done something wrong. Any insight is greatly appreciated!

  • Well, presumably this error occurs because `input` is not a valid argument to the `wordfish(...)` function. Try: `wordfish(wfm=wcdata, dir=c(221,223))`. If you are new to programming, I wouldn't start with R. – jlhoward Jun 13 '14 at 20:57
  • Thanks for your feedback! @jlhoward the error says: wordfish(wfm=wcdata, dir=c(221,223)) Error in wordfish(wfm = wcdata, dir = c(221, 223)) : First argument must be an object of type wfm I've used spss and stata before, so I'm trying to give R a shot. Thanks for your help! – user3738982 Jun 16 '14 at 19:27

1 Answers1

2

The problem is the that there is a difference between different implementations of wordfish. As listed at http://www.wordfish.org/software.html, there is the "original" version and a version implemented in the AUSTIN package. The "original" version has a parameter named input=, however, the AUSTIN implementation uses a parameter named wfm=.

If you didn't name your parameter, and just left it as the first thing you passed to the function, it would have worked as well because those arguments are positional as well. But once you name them, you disrupt the positional order and the name takes precedence.

So either take off the name, or use the correct name for the AUSTIN package (input=)

Also the package is looking for particular names on the object passed in. You can ensure you are passing a wfm object by running your data through the wfm function. I'm not sure what the 'dir' parameter is for but I had to set it as well to get this minimal example running.

docs <- c(D1 = "look at all the words in the document", 
    D2 = "i hope this document has more words than the other document")
text.corpus.format <- Corpus(VectorSource(docs))

wordfreqmatrix <- TermDocumentMatrix(text.corpus.format)
wcdata <- wfm(as.matrix(wordfreqmatrix))

wordfish(wcdata, dir=c(1,2))

# Call:
#   wordfish(wfm = wfm(wcdata), dir = c(1, 2))
# 
# Document Positions:
#   Estimate Std. Error    Lower    Upper
# 1  -1.0378     0.4832 -1.98476 -0.09078
# 2   0.8763     0.4322  0.02917  1.72351
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks for replying! I attempted both ways and received the error: `wordfish(wfm=wcdata, dir=c(221,223)) Error in wordfish(wfm = wcdata, dir = c(221, 223)) : First argument must be an object of type wfm` and `wordfish(t(as.matrix(TermDocumentMatrix(text.corpus.format))),dir=c(221,223)) Error in wordfish(t(as.matrix(TermDocumentMatrix(text.corpus.format))), : First argument must be an object of type wfm` – user3738982 Jun 16 '14 at 23:43
  • Thank you! adding wcdata<-wfm(as.matrix(wrdfreqmatrix)) and running wordfish(wcdata,dir=c(221,223) did the trick. – user3738982 Jun 18 '14 at 16:53
  • `dir` sets the (arbitrary identifying) sign of the inferred latent scale. It's documented in `?wordfish` and also in the vignette. – conjugateprior Feb 06 '15 at 21:47