I am working on creating topic models based on Tweets in R
using the topicmodels
package.
I want to create a dataframe containing all the results from the topic model so that I can insert it into a database. This is how I do it:
# create dataframe with relevant results
topics <- as.data.frame(ldaTopics@gamma)
topics$id <- as.character(ldaTopics@documents)
topics$topic <- topics(ldaTopics)
# reorder columns to match table structure in database
reordered_topics <- topics[,c(6, 1, 2, 3, 4, 5, 7)]
# write results to db
dbWriteTable(con, "topics", value = reordered_topics, append = TRUE, row.names = FALSE)
Now my problem: when I write my dataframe to the database I get the error that there are duplicate id's:
RS-DBI driver: (could not Retrieve the result : ERROR: duplicate key value violates unique constraint "topics_pkey" DETAIL: Key (id)=(1) already exists. CONTEXT: COPY topics, line 1
This is strange since I have checked to what extent the id's are indeed with `SELECT COUNT(DISTINCT id) FROM tweets;. The number returned was the same as selecting all the id's in the entire table.
I think something is going wrong with my way of combining the topicmodels
results into a dataframe. But I cannot figure out what is happening. Does anyone know what is happening here?