0

Context:

I've got Python processes running on the same container and I want to be able to share a read-only key-value object between them.

I'm aware I could use something like Redis to share that info, but I'm looking for optimal solution in regards to latency as well as memory usage.

My idea was to generate a binary object on disk and open that file using mmap

Question: That brings me to my question, is there a binary format or library that would load a read-only file in ram an offer a dictionary interface, without the need to deserialize the file content? This way I could mmap the file in every process, all process would be re-using the same RAM for that file and I would be able to access the file's content with a dict-like interface?

I'm looking a file/object format for dictionaries, similar to what Parquet is to columnar storage, that could be consumed in read-only mode by python.

Damien
  • 1,944
  • 2
  • 18
  • 21
  • If it's read-only, why not just read once into a `dict` before forking to all processes in a way that they will have access to the referenced `dict`? Really though your approach has certain issues, [please refer to comments attached to the question of this thread](https://stackoverflow.com/questions/26449645/proper-mmap-use-python). – metatoaster Feb 21 '23 at 02:43
  • @metatoaster. This will work on Unix, but not on Mac. Unix uses fork() to create new processes, but Mac creates a new process. – Frank Yellin Feb 21 '23 at 03:03
  • 1
    @FrankYellin OP didn't specify Mac, though if OP really do want to share the same memory across process they might also want to consider [`multiprocessing.shared_memory`](https://docs.python.org/3/library/multiprocessing.shared_memory.html) instead. – metatoaster Feb 21 '23 at 03:09
  • Good idea @metatoaster. What limits me right now is that my processes are launched by gunicorn, so unless I fork the code and can’t the logic there. Also I mentioned the data would be immutable, but I was thinking a loading a new file every hour with the most up to date info. – Damien Feb 21 '23 at 03:11
  • 1
    Well, if `gunicorn` is already being used to launch _new_ Python processes, [this discussion](https://stackoverflow.com/questions/27240278/sharing-memory-in-gunicorn) (and all the linked threads on the side-bar) may be more relevant - you may consider having a dedicated process that will provide the data, but that's kind of replicating redis... – metatoaster Feb 21 '23 at 03:17
  • I tried to clarify the question. I'm really looking for a binary file/object storage for dictionaries/Map with a python interface. – Damien Feb 21 '23 at 13:26
  • you should use a sqlite file instead and connect to it using a regular sql client from all processes – Asad Awadia Feb 28 '23 at 03:50
  • If I understand you suggesting @AsadAwadia, you are suggesting to load the sqlite DB using https://www.sqlite.org/mmap.html to get RAM speed with no disk access ? – Damien Mar 14 '23 at 14:04
  • No. I am saying you don't need mmap at all. Just use sqlite and it will be more than sufficient for your needs. If it is not then ping me again and we can go from there – Asad Awadia Mar 15 '23 at 15:29

1 Answers1

0

If the size of the file is small - in every process slurp it into memory

Else use sqlite

My idea was to generate a binary object on disk and open that file using mmap

You should not be using mmap

Asad Awadia
  • 1,417
  • 2
  • 9
  • 15