0

I'm working on mining some financial articles using tidytext, I download the data from Reuters but then when I'm trying to turn each corpus into a data frame I get some errors about unnest command not taking functions as input...

Do you have any alternatives to get this into a tibble?

library(tm.plugin.webmining)
library(purrr)
company <- c("Microsoft", "Apple", "Google", "Amazon", "Facebook",
             "Twitter", "IBM", "Yahoo", "Netflix")

symbol <- c("MSFT", "AAPL", "GOOG", "AMZN", "FB", "TWTR", "IBM", "YHOO", "NFLX")

download_articles <- function(symbol) {
  WebCorpus(ReutersNewsSource(paste0("NASDAQ:", symbol)))
}

stock_articles <- data_frame(company = company, symbol = symbol) %>%
  mutate(corpus = map(symbol, download_articles))

stock_articles

stock_tokens <- stock_articles %>%
  unnest(map(corpus, tidy)) %>%
  unnest_tokens(word, text) %>%
  select(company, datetimestamp, word, id, heading)
stock_tokens
s_baldur
  • 29,441
  • 4
  • 36
  • 69
lgds
  • 43
  • 1
  • 3
  • It's unclear what you're doing with that `unnest` command. If the question is just about that reshaping step, maybe you can just post the data at that step and pare the question down so we don't have to download & redo all your analysis as well – camille Jan 21 '20 at 18:45

2 Answers2

0

I'm trying to transform the corpus column of stock_articles into a regular data frame

It is a list-column whith WebCorpus type variable so I'm trying to tidy each observation and then turn it into a column using unnest

[1]: https://github.com/leytigeorges/miningfinancial here you can find a file with the data (mydata)

lgds
  • 43
  • 1
  • 3
0

What's happening here is that some of the services have been deprecated, unfortunately, and tm.plugin.webmining is out of date. You can read some more details here. We are looking for a replacement dataset for this part of our book, but in the meantime, if you would like to explore using this code, I would recommend stripping down to just, say, 4 companies that appear to still be working.

symbol <- c("MSFT", "AAPL", "AMZN", "IBM")
Julia Silge
  • 10,848
  • 2
  • 40
  • 48