0

I am trying to utilize all cores of CPU, for NumPy, I am able to use all cores, but when I am performing some operation in pandas, then again it using only one core of CPU. I have tried to set the max thread and other params, but nothin working foe me.

eg,

def a(x):
  #### # performing a lot of operations and checks. 

## df.groupby(column_A).apply(a).reset_index()

when the system is executing Numpy operation, that time CPU is utilization most of the cores, but when it comes to pandas everything is running on a single core.

Progman
  • 16,827
  • 6
  • 33
  • 48
  • Does this answer your question? [Weird bug in Pandas and Numpy regarding multithreading](https://stackoverflow.com/questions/59445147/weird-bug-in-pandas-and-numpy-regarding-multithreading) – ead Aug 19 '20 at 12:29

1 Answers1

1

Pandas' groupby is implemented in pure python (see source), and is limited by CPython's GIL (Global Interpreter Lock) to single threaded execution. Libraries can choose to handle thread safety themselves and free the GIL during intensive computations, or run native code that may spawn 'unrestricted' threads itself.

You can try running your code on IronPython which doesn't have a GIL, but taking a glance at Pandas' code I didn't see any threads so I'm not sure it will help.

Egal
  • 1,374
  • 12
  • 22