16

the function signature for pandas.read_csv gives, among others, the following options:

read_csv(filepath_or_buffer, low_memory=True, memory_map=False, iterator=False, chunksize=None, ...)

I couldn't find any documentation for either low_memoryor memory_map flags. I am confused about whether these features are implemented yet and if so how do they work.

Specifically,

  1. memory_map: If implemented does it use np.memmap and if so does it store the individual columns as memmap or the rows.
  2. low_memory: Does it specify something like cache to store in memory?
  3. can we convert an existing DataFrame to a memmapped DataFrame

P.S. : versions of relevant modules

pandas==0.14.0
scipy==0.14.0
numpy==1.8.1
dhke
  • 15,008
  • 2
  • 39
  • 56
goofd
  • 2,028
  • 2
  • 21
  • 33
  • ``low_memory`` should prob be documented (though it is an older option that doesn't really do much). ``memory_map`` is not documented because its not implemented (nor does it do anything). So the answer to your questions are all no. – Jeff Jun 16 '14 at 18:21
  • 4
    https://github.com/pydata/pandas/issues/5888 – Jeff Jun 16 '14 at 18:21
  • FYI, these are not in the public doc-strings either, so not sure where you are looking. – Jeff Jun 16 '14 at 18:22
  • I will revise slightly, ``memory_map`` is technically defined and tested. Never seen it used. Give it a try and report back. (it doesn't use ``np.memmap``, but just holds a limited amount of data in-memory). But I think this is an older / deprecated option anyhow. – Jeff Jun 16 '14 at 18:26
  • 1
    Thanks @Jeff! I did a ``help(pd.read_csv)`` to get the docstrings. Thanks for the github reference. – goofd Jun 16 '14 at 18:52
  • Btw, I passed True to the two flags separately and the df loaded correctly - not sure whether it did a ``memmap`` tho' – goofd Jun 16 '14 at 18:54
  • yeh, never saw that option before! (its default False). as I said, might be old as ``read_csv`` is quite efficient in memory space so prob not necessary. – Jeff Jun 16 '14 at 18:56
  • Maybe relevant question: http://stackoverflow.com/questions/24251219/low-memory-option-in-read-csv – firelynx Jun 11 '15 at 09:13
  • Does this answer your question? [Pandas read\_csv low\_memory and dtype options](https://stackoverflow.com/questions/24251219/pandas-read-csv-low-memory-and-dtype-options) – AMC Mar 16 '20 at 18:20
  • thanks for the comment @AMC. The accepted answer from was very helpful – goofd Apr 21 '20 at 22:09

1 Answers1

4

I will attempt to sum up the comments to this question and also add my own research into one comprehensive answer.

  1. low_memory option is kind of depricated, as in that it does not actually do anything anymore (source).

  2. memory_map does not seem to use the numpy memory map as far as I can tell from the source code It seems to be an option for how to parse the incoming stream of data, not something that matters for how the dataframe you receive works.

  3. Since my assumption in point 2 is that this is only for parsing, this question is kind of irrelevant.
firelynx
  • 30,616
  • 9
  • 91
  • 101