obtaining textual data from a single column in dataframe

Question

I want to read as text only one specific column of my dataframe, i.e. the 3rd column C, and create a word cloud. Let df=

A B C
1 2 sheep
2 2 sheep
3 4 goat
4 5 camel
5 2 camel
6 1 camel

I am try to readLines from readLines(df$C) but I get the following error:

 Error in readLines(df$C) : 
  'con' is not a connection

If you already have this as a data frame, does `df$C` not get what you're looking for? — camille, Jun 12 '18 at 15:39
`readLines` is for reading lines of information from a file. If I'm understanding this correctly, you already have a data frame, so you don't need to read anything into your session — camille, Jun 12 '18 at 15:40

score 2 · Accepted Answer · answered Jun 12 '18 at 15:43

2

df <- read.table(textConnection("A B C
1 2 sheep
2 2 sheep
3 4 goat
4 5 camel
5 2 camel
6 1 camel"), header = TRUE, stringsAsFactors = FALSE)

library("quanteda")
## Package version: 1.3.0

corpus(df, text_field = "C") %>%
    dfm() %>%
    textplot_wordcloud(min_count = 1)

answered Jun 12 '18 at 15:43

Ken Benoit

14,454
27
50

just for a curiosity, is there a straightforward way to remove punctuations and stop words in the text analyzed? – Economist_Ayahuasca Jun 13 '18 at 08:41
1

Yes, in the. `dfm()` call you can pass arguments to `tokens()` - see `?tokens` - and one of the `dfm()` arguments is `remove` (for removing stopwords). – Ken Benoit Jun 13 '18 at 08:52

obtaining textual data from a single column in dataframe

1 Answers1