1

I am trying to retrieve the contents of my sqlite3 database and updating this data utilizing a scraper in a for loop.

The presumed flow is as follows:

  • Retrieve all rows from the dataset
  • For each row, find the URL column and fetch some additional (updated) data
  • Once this data has been obtained, upsert (update, add columns if not existent) this data to the row the URL was taken from.
  • I love the dataset package because of 'upsert', allowing it to dynamically add whatever columns I may have added to the database if non-existent.

    My code produces an error I can't explain, however.

    'ResourceClosedError: This result object is closed.'
    

    How would I go about obtaining my goal without running into this? The following snippet recreates my issue.

    import dataset db = dataset.connect('sqlite:///test.db')

    # Add two dummy rows
    testrow1 = {'TestID': 1}
    testrow2 = {'TestID': 2}
    db['test'].upsert(testrow1, ['TestID'])
    db['test'].upsert(testrow2, ['TestID'])
    print("Inserted testdata before loop")
    
    # This works fine
    testdata = db['test'].all()
    for row in testdata:
        print row
    # This gives me an 'ResourceClosedError: This result object is closed.' error?
    i = 1 # 'i' here exemplifies data that I'll add through my scraper.
    testdata = db['test'].all()
    for row in testdata:
        data = {'TestID': i+1000}
        db['test'].upsert(data, ['TestID'])
        print("Upserted within loop (i = " + str(i) + ")")
        i += 1
    
    MattV
    • 1,353
    • 18
    • 42

    1 Answers1

    2

    The issue might be you are querying the dataset and accessing the result object (under 'this works fine") and reading it all in a loop and then immediately trying to do another loop again with upserts on the same result object. The error is telling you that the resource has been closed, basically once you read it the connection is closed automatically (as a feature!). (see this answer about 'automatic closing' for more on the why and ways to get around it.)

    Given that result resources tend to get closed, try fetching the results again at the beginning of your upsert loop:

    i = 1 # 'i' here exemplifies data that I'll add through my scraper.
    testdata = db['test'].all()
    for row in testdata:
        data = {'TestID': i}
        db['test'].upsert(data, ['TestID'])
        print("Upserted within loop (i = " + str(i) + ")")
        i += 1
    

    Edit: See comment, the above code would change the testdata inside the loop and thus still gives the same error, so a way to get around this is to read the data into an array first and then loop through that array to do the updates. Something like:

    i = 1 # 'i' here exemplifies data that I'll add through my scraper.
    testdata = [row for row in db['test'].all()]
    for row in testdata:
        data = {'TestID': i}
        db['test'].upsert(data, ['TestID'])
        print("Upserted within loop (i = " + str(i) + ")")
        i += 1
    
    Community
    • 1
    • 1
    brobas
    • 596
    • 4
    • 5
    • Thanks! You're definitely right on those first errors ;-) I edited the question. Your solution doesn't seem to get rid of the error though, I found a workaround by selecting the items into a list prior to the loop -- it seems to not like looping over it while updating the same table? – MattV Dec 09 '14 at 19:41
    • That actually makes sense, because you're changing the thing you are looping over inside the loop, which could cause unpredictable results if allowed to happen. I edited my answer, thanks for the upvote! – brobas Dec 09 '14 at 19:58