0

I want to read as text only one specific column of my dataframe, i.e. the 3rd column C, and create a word cloud. Let df=

A B C
1 2 sheep
2 2 sheep
3 4 goat
4 5 camel
5 2 camel
6 1 camel

I am try to readLines from readLines(df$C) but I get the following error:

 Error in readLines(df$C) : 
  'con' is not a connection
Economist_Ayahuasca
  • 1,648
  • 24
  • 33
  • 1
    Don't you just want `df$C` ? – G5W Jun 12 '18 at 15:38
  • If you already have this as a data frame, does `df$C` not get what you're looking for? – camille Jun 12 '18 at 15:39
  • 1
    `readLines` is for reading lines of information from a file. If I'm understanding this correctly, you already have a data frame, so you don't need to read anything into your session – camille Jun 12 '18 at 15:40

1 Answers1

2
df <- read.table(textConnection("A B C
1 2 sheep
2 2 sheep
3 4 goat
4 5 camel
5 2 camel
6 1 camel"), header = TRUE, stringsAsFactors = FALSE)

library("quanteda")
## Package version: 1.3.0

corpus(df, text_field = "C") %>%
    dfm() %>%
    textplot_wordcloud(min_count = 1)

enter image description here

Ken Benoit
  • 14,454
  • 27
  • 50
  • just for a curiosity, is there a straightforward way to remove punctuations and stop words in the text analyzed? – Economist_Ayahuasca Jun 13 '18 at 08:41
  • 1
    Yes, in the. `dfm()` call you can pass arguments to `tokens()` - see `?tokens` - and one of the `dfm()` arguments is `remove` (for removing stopwords). – Ken Benoit Jun 13 '18 at 08:52