0

I recently used Pandas in Python to run an SQL query to create a dataframe which I subsequently write to disk with Pickle. My program has gotten stuck at the pd.read_sql_query(sql) step for over 14 hours now, and I'm not sure what's wrong. I looked at Task manager and the system's memory usage is at 100% and the disk usage is also at 100% (45 MB/sec). This is a query for a large company with an immense amount of data so what I assume is happening here is that the data returned from the query is so large that it cannot fit into memory (I estimate it is around 100 gigabytes while system ram is only 16 gigabtyes) and the disk only has 59 gigabytes of space left on it for swap / pagefile.

I'm on Windows 10. "My Computer" still shows the disk as having that 59 GB available, but given the constant disk usage from task manager, it seems like since it is running out of memory to store the dataframe the data has to be getting written to a pagefile. However, I still do not expect the query to take this long and I also don't understand why the disk would still have space left if the pagefile was actually writing the rest of the dataframe to disk. So clearly I'm missing something there given the free space on the C drive has been reported as 59 GB all day long.

Will the query ever end? How does Windows handle the situation when both RAM and the disk aren't enough for the swap / pagefile? For some context, the Python process is still continuously using 11% of the CPU on a 4 core / 8 thread intel i5, so it's not like it's sitting there doing nothing. I just don't really understand what's going on behind the scenes given the context of the huge query and the limited system resources.

joejoejoejoe4
  • 1,206
  • 1
  • 18
  • 38

0 Answers0