My ultimate goal is to create a quanteda dictionary to use for topic classification on text data.
However, my topic keywords are stored in a somewhat different format: I have a column of about 4000 keywords and a second column that specifies the topic each keyword belongs to. Note that there is no equal number of words for each topic. My data looks like this:
keywords topic
[1] "one" "number"
[2] "two" "number"
[3] "three" "number"
[4] "triangle" "form"
[5] "circle" "form"
[...]
How can I transform my keywords into a (quanteda) dictionary format, i.e. a list that contains named vectors for each topic that contain the keywords for the respective topic?
The list should look like this:
list(number = c("one","two","three"),
form = c("triangle","circle"))
Any help much appreciated!
Find my approach so far bloew. But it doesn't appear right to me (or working):
# 1) Initialize an empty list of vectors that corresponds to my number of topics & add topic names ("topic_names" is just a vector type chr 1:88 that includes the topic names)
mydictionary <- vector(mode = "list", length = 88)
names(mydictionary ) <- topic_names
# 2) Create a loop that checks for each keyword to match a topic and adds it to the respective vector of my dictionary
# I got it working for one keyword like this:
if (names(mydictionary [1]) == keyword_list$topic[1]) { # if topic of keyword matches topic vector name
mydictionary[[1]] <- c(mydictionary[[1]], keyword_list$keywords[1]) #add keyword to topic vector
}
# However, I don't know how to transform this into a loop, since a loop has to check every index of keyword_list for every index of mydictionary and I don't know how to achieve this...