0

I'm trying to inspect a dataset to understand all of the different categorical qualities a dataset can take on.

The actual data set I'm using has 100,000+ rows and I have no idea whats in it

For simplicity's To illustrate, for the following df:

a<-(1:10)
b<-c("a,b","c,d","c","c","a","a,d","b,d","c","c","a")
example_df <- data.frame(a,b)
example_df

I would like a function that will return: a,b,c,d

I have tried using "unique" function, but this doesn't work, because it returns combinations:

uni <- unique(example_df$b)
uni
[1] a,b c,d c   a   a,d b,d
Levels: a a,b a,d b,d c c,d

Doe anyone know of a solution for this?

2 Answers2

1

We can split the 'b' column by , into a list, unlist to a vector and get the unique elements

unique(unlist(strsplit(as.character(example_df$b), ",")))
#[1] "a" "b" "c" "d"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks! I tried, this and received this error, however: : "non-character argument", do you have any insight about this? – PortMadeleineCrumpet Nov 21 '20 at 23:35
  • @PortMadeleineCrumpet May be you have a `factor` column. You can change it to `character` with `as.character` Updated the post. From R 4.0, by default, `stringsAsFactors = FALSE` – akrun Nov 21 '20 at 23:37
0

You can use separate_rows to divide the data into separate rows and use distinct to get unique values.

library(dplyr)

example_df %>%
  mutate(b = as.character(b)) %>%
  tidyr::separate_rows(b,sep = ',') %>%
  distinct(b)

#   b    
#  <chr>
#1 a    
#2 b    
#3 c    
#4 d    
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213