1

I have the following dataframe:

str(dat2)
data.frame: 29081 obs. of 105 variables:
$ id: int 20 34 46 109 158....
$ reddit_id: chr "t1_cnas90f" "t1_cnas90t" "t1_cnas90g"....
$ subreddit_id: chr "t5_cnas90f" "t5_cnas90t" "t5_cnas90g"....
$ link_id: chr "t3_c2qy171" "t3_c2qy172" "t3_c2qy17f"....
$ created_utc: chr "2015-01-01" "2015-01-01" "2015-01-01"....
$ ups: int 3 1 0 1 2....
...

How can i change the datatype of reddit_id, subreddit_id and link_id from character to factor? I know how to do it one column by column, but as this is tedious work, i am searching for a faster way to do it.

I have tried the following, without success:

dat2[2:4] <- data.frame(lapply(dat2[2:4], factor))

From this approach. Its end up giving me an error message: invalid "length" argument

Another approach was to do it this way:

dat2 <- as.factor(data.frame(dat2$reddit_id, dat2$subreddit_id, dat2$link_id))

Result: Error in sort.list(y): "x" must be atomic for "sort.

After reading the error i also tried it the other way around:

dat2 <- data.frame(as.factor(dat2$reddit_id, dat2$subreddit_id, dat2$link_id))

Also without success

If some information are missing, I am sorry. I am a newbie to R and Stackoverflow...Thank you for your help!!!

Community
  • 1
  • 1
Arthur Pennt
  • 155
  • 1
  • 14

1 Answers1

0

Try with:

library("tidyverse")

data %>% 
  mutate_at(.vars = vars(reddit_id, subreddit_id, link_id)), 
            .fun = factor)

To take advantage of partial matching, use

data %>% 
  mutate_at(.vars = vars(contains("reddit"), link_id), 
            .fun = factor)
Oscar Montoya
  • 611
  • 4
  • 8