6

I'm building a simple content based recommendations system. In order to compute the cosine similarity in a GPU accelerated way, i'm using Pytorch.

At the time of creating the tfidf vocabulary tensor from a csr_matrix, it promts the following RuntimeErrorr

RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0

I'm doing it in this way:

coo = tfidf_matrix.tocoo()
values = coo.data
indices = np.vstack( (coo.row, coo.col ))
i = torch.LongTensor(indices)
v = torch.FloatTensor(values)
tfidf_matrix_tensor = torch.sparse.FloatTensor(i, v, torch.Size(coo1.shape)).to_dense() 
# Prompts the error

I tried with a small test (tfidf matrix size = 10,296) dataset and it works. The tfidf matrix size from the real dataset is (27639, 226957)

room13
  • 883
  • 2
  • 10
  • 26
  • This is likely a bug in PyTorch and is better resolved by asking on [github issues](https://github.com/pytorch/pytorch/issues?page=4&q=is%3Aissue+is%3Aopen). – Jatentaki May 24 '19 at 10:41

3 Answers3

2

I tried the same piece of code that was throwing this error with the older version of PyTorch. It said that I need to have more RAM. Therefore, it's not a PyTorch bug. The only solution is to reduce matrix size somehow.

Andrew Sklyar
  • 293
  • 4
  • 16
0

I was having the same issue with converting small Numpy matrices, and the fix was using torch.tensor instead of torch.Tensor. I'd imagine once you do that, you can cast to the specific type of tensor you want.

John Targaryen
  • 1,109
  • 1
  • 13
  • 29
0

A bit tangential, in my case, I ran into this issue while running the DGL implementation of GraphSAGE. I was working on a Twitter network graph of around 10 M nodes and was using the default Twitter userid as my nodeid. I realised that the CPU was running out of memory when trying to map this to the long dtype, so I remapped my ids from 0,1... and then this issue was resolved.

Sarah Masud
  • 79
  • 1
  • 6