How to use pandas_profiling with a large database table

Question

I'm trying to use pandas_profiling to profile a table. It has around 20 columns most of them are float and almost 3 millions records.

I got the following error :

Traceback (most recent call last): File "V:\Python\prof.py", line 53, in if name == "main": main() File "V:\Python\prof.py", line 21, in main df = pd.read_sql(query, sql_conn) File "C:\Users\linus\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\sql.py", line 380, in read_sql chunksize=chunksize) File "C:\Users\linus\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\sql.py", line 1477, in read_query data = self._fetchall_as_list(cursor) File "C:\Users\linus\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\sql.py", line 1486, in _fetchall_as_ list result = cur.fetchall() MemoryError

I have tried with less record it worked.

Is there a way to bypass this error ? It looks like it is a memory limitation. Can we do that another way ? Or it is impossible with Python ?

Thanks for you help

score 0 · Answer 1 · answered Jul 04 '19 at 19:17

0

If you are in the position to provide information so that we can replicate the error, we can resolve it. I would recommend opening an issue on the github page.

Disclose: I am co-author of this package.

answered Jul 04 '19 at 19:17

Simon

5,464
6
49
85

How to use pandas_profiling with a large database table

1 Answers1