What is an optimum way to read huge data from oracle table and fetch into a data frame

Question

I am going to read data from a table in my oracle database and fetch it in a data frame in python. The table has 22 million records and using fetchall() takes a long time without any result. (the query runs in oracle in 1 second)

I have tried using slicing the data with below code, but still it is not efficient.

import cx_Oracle
import pandas as pd
from pandas import DataFrame
connect_serv = cx_Oracle.connect(user='', password='', dsn='')
cur = connect_serv.cursor()  

table_row_count=22242387;
batch_size=100000;

sql="""select t.* from (select a.*,ROW_NUMBER() OVER (ORDER BY column1 ) as row_num  from  table1 a) T where t.row_num between :LOWER_BOUND and :UPPER_BOUND"""

data=[]
for lower_bound in range (0,table_row_count,batch_size):
    cur.execute(sql,{'LOWER_BOUND':lower_bound,
                     'UPPER_BOUND':lower_bound + batch_size - 1})
    for row in cur.fetchall():
        data.append(row)

I would like to know what is the proper solution to fetch this amount of data in python in a reasonable time.

Do you have enough system memory to hold the complete dataset in a dataframe? — rdas, Apr 17 '19 at 07:21
that is how databases works databases can have a limited speed to transfer data and your wifi connection also has his limit so i think you reached the speed limit — Matthijs990, Apr 17 '19 at 07:22
Python is slow. Converting data into Python objects takes time. Lots of time. — freakish, Apr 17 '19 at 07:25
Please define what "reasonable" time is. How much time is it taking? There's no way that it takes 1 second to retrieve all of that data "in Oracle". Can you define what that means as well? It would help to know what you are trying to accomplish as well. There may be better solutions. — Anthony Tuininga, Apr 17 '19 at 22:43
It is runnig for 2 hours with no result, in oracle when I execute the same select statement in toad, it returns result in 1 second. — mahsa-ebrahimian, Apr 22 '19 at 13:42

score 0 · Answer 1 · answered Apr 17 '19 at 07:40

0

It's not the query that is slow, it's the stacking of the data with data.append(row).

Try using

data.extend(cur.fetchall())

for starters. It will avoid the repeated single-row appending, but rather append the entire set of rows coming from fetchall at once.

answered Apr 17 '19 at 07:40

Tammo Heeren

1,966
3
15
20

no different was made. the problem seems to be with fetchall . isnt there any other way to avoid this part? – mahsa-ebrahimian Apr 17 '19 at 07:52
Do you need to have all the data available at once, or are you processing the data sequentially? – Tammo Heeren Apr 17 '19 at 14:03
You can also use ``data.extend(cur)`` – Anthony Tuininga Apr 17 '19 at 22:40
@tammo : I am processing each record separately, but I assume it is faster to fetch all of them at once and run the process on a batch. – mahsa-ebrahimian Apr 22 '19 at 13:40
Not necessarily. Appending or extending large arrays take a long time. You might be better off with a generator type function. You can still process them with a for loop, but won't require you to put and hold everything in memory.. – Tammo Heeren Apr 23 '19 at 02:59

score 0 · Answer 2 · answered Feb 19 '21 at 03:06

You will have to tune arraysize and prefetchrow parameters. I was having the same issue. Increasing arraysize resolved the issue. Choose the values based on the memory you have.

Link: https://cx-oracle.readthedocs.io/en/latest/user_guide/tuning.html?highlight=arraysize#choosing-values-for-arraysize-and-prefetchrows

What is an optimum way to read huge data from oracle table and fetch into a data frame

2 Answers2