2

I am going to read data from a table in my oracle database and fetch it in a data frame in python. The table has 22 million records and using fetchall() takes a long time without any result. (the query runs in oracle in 1 second)

I have tried using slicing the data with below code, but still it is not efficient.

import cx_Oracle
import pandas as pd
from pandas import DataFrame
connect_serv = cx_Oracle.connect(user='', password='', dsn='')
cur = connect_serv.cursor()  

table_row_count=22242387;
batch_size=100000;

sql="""select t.* from (select a.*,ROW_NUMBER() OVER (ORDER BY column1 ) as row_num  from  table1 a) T where t.row_num between :LOWER_BOUND and :UPPER_BOUND"""

data=[]
for lower_bound in range (0,table_row_count,batch_size):
    cur.execute(sql,{'LOWER_BOUND':lower_bound,
                     'UPPER_BOUND':lower_bound + batch_size - 1})
    for row in cur.fetchall():
        data.append(row)

I would like to know what is the proper solution to fetch this amount of data in python in a reasonable time.

mahsa-ebrahimian
  • 141
  • 1
  • 3
  • 10
  • Do you have enough system memory to hold the complete dataset in a dataframe? – rdas Apr 17 '19 at 07:21
  • that is how databases works databases can have a limited speed to transfer data and your wifi connection also has his limit so i think you reached the speed limit – Matthijs990 Apr 17 '19 at 07:22
  • Python is slow. Converting data into Python objects takes time. Lots of time. – freakish Apr 17 '19 at 07:25
  • Please define what "reasonable" time is. How much time is it taking? There's no way that it takes 1 second to retrieve all of that data "in Oracle". Can you define what that means as well? It would help to know what you are trying to accomplish as well. There may be better solutions. – Anthony Tuininga Apr 17 '19 at 22:43
  • It is runnig for 2 hours with no result, in oracle when I execute the same select statement in toad, it returns result in 1 second. – mahsa-ebrahimian Apr 22 '19 at 13:42

2 Answers2

0

It's not the query that is slow, it's the stacking of the data with data.append(row).

Try using

data.extend(cur.fetchall())

for starters. It will avoid the repeated single-row appending, but rather append the entire set of rows coming from fetchall at once.

Tammo Heeren
  • 1,966
  • 3
  • 15
  • 20
0

You will have to tune arraysize and prefetchrow parameters. I was having the same issue. Increasing arraysize resolved the issue. Choose the values based on the memory you have.

Link: https://cx-oracle.readthedocs.io/en/latest/user_guide/tuning.html?highlight=arraysize#choosing-values-for-arraysize-and-prefetchrows

RMS
  • 81
  • 1
  • 1