1

I am working with fake JSON data from a dummy JSON site that looks like this:

[
  {
    "postId": 1,
    "id": 1,
    "name": "id labore ex et quam laborum",
    "email": "Eliseo@gardner.biz",
    "body": "laudantium enim quasi est quidem magnam voluptate ipsam eos\ntempora quo necessitatibus\ndolor quam autem quasi\nreiciendis et nam sapiente accusantium"
  },
  {
    "postId": 1,
    "id": 2,
    "name": "quo vero reiciendis velit similique earum",
    "email": "Jayne_Kuhic@sydney.com",
    "body": "est natus enim nihil est dolore omnis voluptatem numquam\net omnis occaecati quod ullam at\nvoluptatem error expedita pariatur\nnihil sint nostrum voluptatem reiciendis et"
  }
]

I read in the API data through the requests library then turn around and send it into an Exasol DB table. See code below

import requests
import pyexasol

def get_api_data():
    r = requests.get("http://jsonplaceholder.typicode.com/comments")
    data = r.json()
    return data
    
def connection():
    session = pyexasol.connect_local_config('my_exasol')
    return session

def send_api_data():
    s = connection()
    data = get_api_data()
    for row in data:
        s.execute("""INSERT INTO TESTBED.TEST_API(postId, id, name, email, body) VALUES ({postId}, {id},{name},
        {email}, {body})""", {'postId': row['postId'], 'id': row['id'], 'name': row['name'], 'email': row['email'],
        'body': row['body']})

send_api_data()

This works fine, the problem is it is incredibly slow. It takes almost 2 mins to insert 500 records. I know there must be a more efficient way to do this. In reality I will be pulling in data from an API that has thousands of records that I will want to turn around and send into a DB table.

Any ideas on a better approach to take?

JD2775
  • 3,658
  • 7
  • 30
  • 52

1 Answers1

1

Executing single insert statements is slow in Exasol, because it's a column based database. You should use IMPORT instead. Make sure to read the best practices for pyexasol as well. Also consider enabling compression.

For your example, try the following. It takes 0.7 seconds to import the data in my case.

import requests
import pyexasol
import pandas
import time

def get_api_data():
    r = requests.get("http://jsonplaceholder.typicode.com/comments")
    data = r.json()
    return data
    
def connection():
    session = pyexasol.connect_local_config('my_exasol')
    return session

def send_api_data():
    s = connection()
    data = get_api_data()

    data_for_import = [(row['postId'], row['id'], row['name'], row['email'], row['body']) for row in data]
    start = time.time()
    s.import_from_iterable(data_for_import, ("TESTBED","TEST_API"))
    print("Finished import after ", time.time() - start, " seconds")

send_api_data()
sirain
  • 918
  • 10
  • 19
  • 1
    This is fantastic, thank you! In my case it took about 3.5 seconds but I think that is because I am going through VPN to a DB that resides on a server. Still, 3.5 seconds is way better than 2 minutes :) – JD2775 Jul 26 '21 at 15:20
  • 1
    Great to hear that it worked for you! Did you [enable compression](https://github.com/badoo/pyexasol/blob/master/docs/BEST_PRACTICES.md#enable-compression-for-wifi-connections)? That might improve the speed even more. – sirain Jul 27 '21 at 06:23
  • good call, thanks. I will add that as well – JD2775 Jul 27 '21 at 15:24