1

I do not want to extract files on the disk but keep the final .txt in memory and parse the file. I can't find anything using Memoize in python 2.7.

.zip -> .gz -> .txt(data needs to be parsed)

My second choice it unzip and parse the .txt file data. Any thoughts?

Vikas Periyadath
  • 3,088
  • 1
  • 21
  • 33
  • I think you can unzip part of the zip, related question here: https://stackoverflow.com/questions/339053/how-do-you-unzip-very-large-files-in-python – scriptboy Feb 09 '18 at 06:35
  • @scriptboy unzip is ok, Basically I want to avoid extracting to the disk but keeping it in memory and parse the text file. – Vikas Periyadath Feb 09 '18 at 06:37
  • But how can you get a `.txt` if you don't unzip? – scriptboy Feb 09 '18 at 06:39
  • What about write to io.BytesIO? https://docs.python.org/2/library/io.html#buffered-streams – Haochen Wu Feb 09 '18 at 06:41
  • @HaochenWu I just want to know how we can solve it using memoize. So any way i will deeply go into your link, because I don't know much about those buffering . Thanks – Vikas Periyadath Feb 09 '18 at 06:46
  • Do you have more context? Memoize is mostly used to store the return value of a function for some fixed args. You will still need something to host the return value, which I suggest to use io.BytesIO here. – Haochen Wu Feb 09 '18 at 06:50

1 Answers1

2

You can unzip the file and write it to an io.BytesIO object, which is essentially an in memory file.

https://docs.python.org/2/library/io.html#buffered-streams

You can then use any function that works for a regular file such as read, seek etc.

This case you get a virtual file that works for any format. If you are certain about the txt is the only thing you are going to use. io module also provides other pure text streams.

Haochen Wu
  • 1,753
  • 1
  • 17
  • 24