0

I have this (somewhat) working script that takes comments from a Reddit subreddit and insert them into a MySQL database. The spaghetti code I ended up with, however, pastes one character from each comment to each row of the database instead of one comment per row.

I have searched through but couldn't find any instance like this on previous cases. Below is a screenshot of how MySQL look like + the code snippet. enter image description here

import praw
import configReddit
import mysql.connector
import configMySQL

# ---------------- REDDIT ---------------- #
reddit = praw.Reddit(
    client_id=configReddit.client_id,
    client_secret=configReddit.client_secret,
    password=configReddit.password,
    user_agent=configReddit.user_agent,
    username=configReddit.username,
    charset='utf8mb4'
)



# ---------------- DATABASE ---------------- #

mydb = mysql.connector.connect(
  host=configMySQL.host,
  user=configMySQL.user,
  password=configMySQL.password,
  database="db_blockchainstable"
)

mycursor = mydb.cursor()
sql = "INSERT INTO test (rdtext) VALUES (%s)"


# ---------------- EXE ---------------- #

for comment in reddit.subreddit("news").stream.comments():
    mycursor.executemany(sql, comment.body)
    mydb.commit()
    print(comment.body)

Below what the console is returning based on the print(comment.body). enter image description here

Also, if I change mycursor.executemany(sql, db) to mycursor.execute(sql, db) I get this error:

enter image description here

If I wrap %s in ' ' --> '%s' the database records %s as value (see below).

enter image description here

A. Prats
  • 59
  • 10
  • does `print(comment.body)` look ok? Try inserting a newline after? – jared Apr 14 '21 at 07:05
  • Yeah the print looks totally fine, one line per comment in the PyCharm console. Could you elaborate on the adding a new line? I’m rather new to these libraries and Python in general. – A. Prats Apr 14 '21 at 07:49
  • The newline is just to see if the comment.body is printing a line or character. You can simply replace `print` with `println` – jared Apr 14 '21 at 08:36
  • Hmm... it returns println is not defined. Is this a Python command? If you want you can take a look to the console and what it returns based on that print(comment.body) (included now in original the post above). – A. Prats Apr 14 '21 at 09:50
  • Never mind, it's added by default in python. By bad mixing languages – jared Apr 14 '21 at 09:54

2 Answers2

0

mycursor.executemany(sql, comment.body) expects lists or tuples. When supplied with a string it will iterate over each character. Replace with mycursor.execute(sql, comment.body)

jared
  • 473
  • 3
  • 16
  • Hey Jared, thanks again for the support. When I change it to .execute it throws an error :/ - I have adapted the original post with more info if you want to take a look, man ;) – A. Prats Apr 14 '21 at 10:08
  • Wrap the `%s` in quotes like `'%s'` – jared Apr 14 '21 at 12:06
  • Yeah, I tried that but I think it assigns that as value %s and it records that on the db (see original post updated). – A. Prats Apr 14 '21 at 12:28
  • Hey man, I hope I'm not bothering, but I've made some progress and now the code is (partially) working. Indeed it had to do with the list / tuples. But honestly I'm rather lost. The solution I found was dict + zip() but is limiting the characters to 500. Any idea what could help? Please see my other answer ;) – A. Prats Apr 14 '21 at 21:35
0

I have come up with an even spaghettier code that works partially... I have certainly made some progress but I don't think is the final answer. Nevertheless, I will add it here in case it serves anyone.

I did the following changes:

  1. Created a dictionary dt = []
  2. Appended comment body to that dictionary dt.append(comment.body)
  3. Then I wrap db within mycursor.executemany in zip as mycursor.executemany(sql, zip(db)

The combination of appending everything onto a dictionary + ZIP makes it work and now I have one comment per row in my MySQL database. This is an answer from a different question in this forum that helped me to get there.

I believe executemany wants a list of tuples each one containing one row. That is not what .items() provide, since it will provide a key and the values associated with that key on each iteration. Luckly the zip() function does exactly what you need:

That said, a) I don't really understand why is working with ZIP and b) I need other alternatives and I don't know any others.

The reason is because ZIP, I believe only allows for max of 500 characters, every time a comment has more than 500 it throws an error. Which means that I'm not carrying 100% of the information from reddit as some comments are above 500 characters.

mysql.connector.errors.DataError: 1406 (22001): Data too long for column 'rdtext' at row 87

Here the new code:

import praw
import configReddit
import mysql.connector
import configMySQL

# ---------------- REDDIT ---------------- #
reddit = praw.Reddit(
    client_id=configReddit.client_id,
    client_secret=configReddit.client_secret,
    password=configReddit.password,
    user_agent=configReddit.user_agent,
    username=configReddit.username,
    charset='utf8mb4'
)



# ---------------- DATABASE ---------------- #

mydb = mysql.connector.connect(
    host=configMySQL.host,
    user=configMySQL.user,
    password=configMySQL.password,
    database="db_blockchainstable",
)

mycursor = mydb.cursor()
sql = "INSERT INTO test (rdtext) VALUES (%s)"


# ---------------- EXE ---------------- #


dt = []

for comment in reddit.subreddit("News").stream.comments():
    print(comment.body)
    print(len(comment.body))
    dt.append(comment.body)
    rdtext = dt
    db = (rdtext)
    mycursor.executemany(sql, zip(db))
    mydb.commit()
A. Prats
  • 59
  • 10
  • Zip actually did it. The 500 character limit was actually coming from the VARCHAR on MySQL! :D Changed that, all good. – A. Prats Apr 18 '21 at 08:32