1

I've got a problem where a list of pydicom Dataset (ie dicom images) isn't picklable, and I'm not sure why.

This is a pydicom specific question - I tried first posting to their google mailing list, but for some reason there list server was failing - so excuse me for posting it here.

I'm on anaconda with latest everything updated via conda (python 3.7.4, latest pydicom, macos)


I've got a med sized prog I'm writing that all works fine (6,000 lines), and now I'm attempting to use multiprocessing to speed it up. The prog reads in a study, groups it by series and then runs a set of tests on each series - so well suited to that.

I've got things working without any problems when I spin up some processes with a set of multiprocessing.Process() calls. I've got my series images as a list of pydicom Dataset's, and I pass that list to my test_series() function, with a couple of other parameters, ...all without any problems.

But I was just trying to switch to using multiprocessing.Pool() instead, for convenience and easier access to the return values from my test_series() function (rather than using Queues or Pipes).


However pool.starmap() called on my list of Dataset's is failing because the list is not pickle'able (specifically, each Dataset in the list isn't). My knowledge of python so far isn't helping me much trying to figure out why however.

I've tried diagnosing it by a) trying pickle on the Dataset as soon as it's read from disk - no problem there, it will pickle fine in that case. b) using 'get_unpicklable' from stackoverflow to see what it is in the Dataset that's causing the problem. When I run it on one of my Dataset's I get the following reported:

"[key type=DataElement]._value.type_constructor (Type 'function' caused: Can't pickle local object 'DataElement._convert_value.<locals>.<lambda>')",

"[val type=DataElement]._value.type_constructor (Type 'function' caused: Can't pickle local object 'DataElement._convert_value.<locals>.<lambda>')"

In my code I add two attributes - simply by <var>.file_name = new_value:

  1. I add a "file_name" string (this was before I looked at Dataset dict and saw it already had a filename - doh
  2. A new numpy array of pixel data _scaled_pixel_data, which is the _scaled_pixel_array scaled up in size (I increased the size 4x for my processing)

(PS - I didn't re-class the pydicom Dataset as I'm new'ish to python and it was faster just to add on attributes by hand)

No problems anywhere else, and also no problems when I call multiprocessing.Process() passing my list as an argument - the problem only occurs when trying to use pool.starmap()

Does anyone have any idea what's up?

Should I have re-classed the pydicom Dataset with two new attributes rather than simply adding attributes manually?

thanks for any help! Richard

Richard
  • 3,024
  • 2
  • 17
  • 40

2 Answers2

2

Lambda functions cannot be pickled, this was pointed out in issue 951, and is currently being worked on.

As an aside, re the comment about the mailing list (google group), it is a moderated list for new members - you should have received a message to wait for moderation.

darcymason
  • 1,223
  • 6
  • 9
  • Thanks Darcy! So if I add my attributes (basically an interpolated larger version of the pixel data) via extending Dataset to a new class, then it should be OK, as I'm not modifying Dataset? – Richard Sep 21 '19 at 06:06
  • Yes, I suppose, but I think shortly there should be a solution which allows pickling of modified datasets. – darcymason Sep 21 '19 at 16:52
  • Great - thanks for the update, and I'll be able to wait – Richard Sep 22 '19 at 04:29
2

This problem is related to pydicom 1.3.0 when they changed deriving pydicom.Dataset from dict. See the following issue: https://github.com/pydicom/pydicom/issues/947

A fix has been merged into master end of Sept 2019. You can use last version on master branch or downgrade to 1.2.2, waiting for 1.3.1.

RomaneG
  • 330
  • 2
  • 10
  • Hi Romane - quick Q: I downgraded to 1.2.2 and passing pickled datasets around into multiprocessing works fine. But in my use, I first add some additional attributes to the datasets, eg dataset.file_name = '/tmp/blah.dcm'. When I do that, these new datasets are gone when I pass the dataset into multiprocessing (I presume they dont survive pickling). So my question is - is adding new attributes a Bad Way to do things in a python sense, should I just instead be sub-classing DataSet instead, and adding my own attributes in that way? – Richard Oct 29 '19 at 23:56
  • Hi Richard, have you tried using the current version of pydicom on the master branch? As mentioned in the other comment from @darcymarson this has been resolved in issue 951. – RomaneG Oct 30 '19 at 20:28
  • Thanks Romane! - no I hadn't yet - was going to wait till 1.3.1. Actually got it working last night as I wanted with 1.2.2. – Richard Oct 31 '19 at 05:41