0

I am working on creating topic models based on Tweets in R using the topicmodels package.

I want to create a dataframe containing all the results from the topic model so that I can insert it into a database. This is how I do it:

# create dataframe with relevant results
topics <- as.data.frame(ldaTopics@gamma)
topics$id <- as.character(ldaTopics@documents)
topics$topic <- topics(ldaTopics)

# reorder columns to match table structure in database
reordered_topics <- topics[,c(6, 1, 2, 3, 4, 5, 7)]

# write results to db
dbWriteTable(con, "topics",  value = reordered_topics, append = TRUE, row.names = FALSE)

Now my problem: when I write my dataframe to the database I get the error that there are duplicate id's:

RS-DBI driver: (could not Retrieve the result : ERROR: duplicate key value violates unique constraint "topics_pkey" DETAIL: Key (id)=(1) already exists. CONTEXT: COPY topics, line 1

This is strange since I have checked to what extent the id's are indeed with `SELECT COUNT(DISTINCT id) FROM tweets;. The number returned was the same as selecting all the id's in the entire table.

I think something is going wrong with my way of combining the topicmodels results into a dataframe. But I cannot figure out what is happening. Does anyone know what is happening here?

Cyrus Mohammadian
  • 4,982
  • 6
  • 33
  • 62
Roska
  • 11
  • 4
  • Could you paste the output from `dput(head(reordered_topics))` here so we can see what it looks like ? And what is the structure of your `topics` table in database ? Maybe check that your primary key has AUTO_INCREMENT on. – Tutuchan Sep 16 '16 at 18:19
  • I think I solved my issue. I removed the original id's (twitter message id's) and assigned an auto increment primary key to each tweet. Now I don get the duplicate keys error anymore. @Tutuchan to answer your database structure question: I created an empty table with a primary key column (I did not specifically turned the auto increment on). I populated the database from a CSV. Is it possible that the CSV contained duplicates? I thought the primary key setting would prevent that from happening – Roska Sep 18 '16 at 08:12
  • Just for the record: I ran the topic models again with the new id's. – Roska Sep 18 '16 at 08:27

0 Answers0