I am new to Python and am uncertain why I am seeing memory usage spike so dramatically when I use Numpy hstack
to join together two pandas
data frames. The performance with pandas.concat
was even worse - if it would finish at all - so I am using NumPy.
The two data frames are relatively large, but I have 20 gb free RAM (using 11GB, including the two data frames I want to copy).
The data frames a and b have shapes:
a.shape (66377, 30)
b.shape (66377, 11100)
when I use np.hstack((a,b))
the free 20GB is had is completely used up.