1

I am trying to load a YAML file using PyYAML module, however, I get a MemoryError. The file size seems reasonable, i.e. 28 MB. I have loaded larger files in the past without any issues. I am using Python27 32bit. Does anybody know what's going on and maybe can they please suggest a solution (I don't want to go down the road of splitting the yaml file).

Following is the Error that I get:

>> yaml_results_file = yaml.load(open
(parent_folder+yaml_results_file_path+yaml_results_file_name, "r"))

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    yaml_results_file = yaml.load(open(parent_folder+yaml_results_file_path+yaml_results_file_name, "r"))
  File "C:\Python27\lib\site-packages\yaml\__init__.py", line 71, in load
    return loader.get_single_data()
  File "C:\Python27\lib\site-packages\yaml\constructor.py", line 39, in get_single_data
    return self.construct_document(node)
  File "C:\Python27\lib\site-packages\yaml\constructor.py", line 48, in construct_document
    for dummy in generator:
  File "C:\Python27\lib\site-packages\yaml\constructor.py", line 398, in construct_yaml_map
    value = self.construct_mapping(node)
  File "C:\Python27\lib\site-packages\yaml\constructor.py", line 208, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "C:\Python27\lib\site-packages\yaml\constructor.py", line 127, in construct_mapping
    key = self.construct_object(key_node, deep=deep)
  File "C:\Python27\lib\site-packages\yaml\constructor.py", line 99, in construct_object
    self.constructed_objects[node] = data
MemoryError
Anthon
  • 69,918
  • 32
  • 186
  • 246
Kevin Bell
  • 277
  • 1
  • 5
  • 14
  • Is this on Windows? Have you tried the SafeLoader, the CLoader? Is the YAML file available somewhere? If not 1) does it have anchors and aliases? 2) Does it have type tags (if not why not use safe-load)? – Anthon Nov 04 '16 at 12:31
  • This is on Windows. As part of a Python project I need to read data from a YAML file. – Kevin Bell Nov 04 '16 at 13:38
  • By the way, I switched to 64 bit Python and the problem was solved but still I can't see why I was getting a MemoryError! Could it be due to the size of the keys in the YAML file? – Kevin Bell Nov 04 '16 at 14:11
  • The size of the file could be it, there is a lot of overhead while loading the data. – Anthon Nov 04 '16 at 16:35

1 Answers1

0

The python YAML module is known for using excessive amounts of memory, well over a hundred times the filesize. In your case, a 28 MB file might well require 3 GB to 9 GB of memory.

A 32-bit process simply cannot allocate such amounts of memory (the address space is only 4 GiB in size, and depending on your OS, the kernel may reserve 1 GiB of that), so the module eventually raises an exception when an allocation fails.

Switching to 64-bit raises the address space-based limit, but does not reduce the actual memory requirements. If you foresee parsing even bigger files, you may want to switch parsers or the method of parsing.

AI0867
  • 376
  • 4
  • 11