11

I have written a script to help me work with a database. Specifically, I am trying to work with files on disk and add the result of this work to my database. I have copied the code below, but removed most of the logic which isn't related to my database to try to keep this question broad and helpful.

I used the code to operate on the files and add the result to the database, overwriting any files with the same identifier as the one I was working on. Later, I modified the script to ignore documents which have already been added to the database, and now whenever I run it I get an error:

pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")

It seems like the server is rejecting the requests, possibly because I have written my code poorly? I have noticed that the error always occurs at the same place in the list of files, which doesn't change. If I re-run run the code, replacing the file list with a list of only the file on which the program crashes, it works fine. This makes me think that after making a certain number of requests, the database just bottoms out.

I'm using Python 3 and MySQL Community Edition Version 14.14 on OS X.

Code (stripped of stuff that doesn't have to do with the database):

import pymysql

# Stars for user-specific stuff
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='*******',
                             db='*******',
                             use_unicode=True, 
                             charset="utf8mb4",
                             )
cursor = connection.cursor()

f_arr = # An array of all of my data objects

def convertF(file_):
    # General layout: Try to work with input and add it the result to DB. The work can raise an exception
    # If the record already exists in the DB, ignore it
    # Elif the work was already done and the result is on disk, put it on the database
    # Else do the work and put it on the database - this can raise exceptions
    # Except: Try another way to do the work, and put the result in the database. This can raise an error
    # Second (nested) except: Add the record to the database with indicator that the work failed

    # This worked before I added the initial check on whether or not the record already exists in the database. Now, for some reason, I get the error:
    # pymysql.err.OperationalError: (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")

    # I'm pretty sure that I have written code to work poorly with the database. I had hoped to finish this task quickly instead of efficiently.
    try:
        # Find record in DB, if text exists just ignore the record
        rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = '{0}'".format(file_["name"]))
        length = cursor.fetchall()[0][0] # Gets the length
        if length != None and length > 4:
            pass
        elif ( "work already finished on disk" ): 
            # get "result_text" from disk
            cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
            cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
            connection.commit()
        else:
            # do work to get result_text
            cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
            cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
            connection.commit()
    except:
        try: 
            # Alternate method of work to get result_text
            cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
            cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
            connection.commit()
        except:
            # Since the job can't be done, tell the database
            cmd = "UPDATE table SET text = %s, hascontent = 0 WHERE name = %s"
            cursor.execute(cmd, ( "NO CONTENT", file_["name"]) )
            connection.commit()

for file in f_arr:
    convertF(file)

2 Answers2

7

Mysql Server Has Gone Away

This problem is described extensively at http://dev.mysql.com/doc/refman/5.7/en/gone-away.html the usual cause is that the server has disconnected for whatever reason and the usual remedy is to retry the query or to reconnect and retry.

But why this breaks your code is because of the way you have written your code. See below

Possibly because I have written my code poorly?

Since you asked.

rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = '{0}'".format(file_["name"]))

This is a bad habit. The manually explicitly warns you against doing this to avoid SQL injections. The correct way is

 rc = cursor.execute("SELECT LENGTH(text) FROM table WHERE name = %s", (file_["name"],))

The second problem with the above code is that you don't need to check if a value exists before you try to update it. You can delete the above line and it's associated if else and jump straight to the update. Besides, our elif and else seem to do exactly the same thing. So your code can just be

try:
        cmd = "UPDATE table SET text = %s, hascontent = 1 WHERE name = %s"
        cursor.execute(cmd, ( pymysql.escape_string(result_text), file_["name"] ))
        connection.commit()
except:  # <-- next problem.

And we come to the next problem. Never ever catch generic exceptions like this. you should always catch specific exceptions like TypeError, AttributeError etc. When catching generic exceptions is unavoidable, you should at least log it.

For example, here you could catch connection errors and attempt to reconnect to the database. Then the code will not stop executing when your server gone away problem happens.

e4c5
  • 52,766
  • 11
  • 101
  • 134
  • Thank you, I'll make those fixes. The first part of the if loop is necessary because I want to avoid re-doing the "work" to get the content for the database, not make sure that the record already exists before update. – TheNationalSarcasmSociety Jan 14 '17 at 16:17
  • Didn't understand your comment – e4c5 Jan 14 '17 at 23:52
  • I just meant that the if loop isn't only for checking if the value exists before I update - I've got other, necesarry logic in there that I cleaned out when I tried to get rid of code that wasn't important to the question. Sorry about that. – TheNationalSarcasmSociety Jan 15 '17 at 05:32
  • I have updated my code by making the connection object an argument of the convertF method, and then making and closing the cursor in the method. In the for loop, I have enclosed convertF in a `try` block so that I may catch the OperationalError, close the old connection, make a new connection, and re-try to convertF. When I re-try the SQL statement where the pipe broke, it breaks again. Could it be the statement itself? Any ideas? Also, I think that removing most of my code has made it hard for you to answer this question, so I put the whole thing [here](https://hastebin.com/tofivizilu.py) – TheNationalSarcasmSociety Jan 15 '17 at 06:10
  • can you actually post that as a new question please – e4c5 Jan 15 '17 at 06:15
  • Yes. Do you recommend that I use stripped code, or full code? – TheNationalSarcasmSociety Jan 15 '17 at 14:42
  • Well it's hard to say but do include a bit more than you have included here – e4c5 Jan 15 '17 at 15:07
  • I did more research and discovered my issue. One of the documents I'm working with is unbelievably large for what it is, and so my query exceeds the `max_allowed_packet` – TheNationalSarcasmSociety Jan 15 '17 at 16:43
  • "remedy is to retry the query or to reconnect and retry" Helpful. Thanks @e4c5. I did this by implementing a try / Except pymysql.err.OperationalError, which simply reconnects and gets a new cursor and then tries the query again. – dlink Jul 07 '21 at 20:25
2

I've solved the same error in the case when I tried to make a bulk inserts by reducing the number of lines I wanted to insert in one command.

Even the maximum number of lines for bulk insert was much higher, I had this kind of error.

Armin Okić
  • 315
  • 2
  • 5
  • 2
    Using the `pd.to_sql()` function, and passing the param `chunksize` is one way to do this which worked for me. chunksize : int, optional Rows will be written in batches of this size at a time. By default, all rows will be written at once. – Halee Jan 25 '19 at 15:37