How can I use the Python io module to build a memory-resident data structure?

Question

I'm trying to write data collected from a data acquisition system to locations in memory, and then asynchronously perform further processing on the data, or write it out to file for offline processing. I'm trying to do this architecture this way to isolate data acquisition from data analysis and transmittal, buying us some flexibility for future expansion and improvement, but it is definitely more complex then simply writing the data directly to a file.

Here is some exploratory code I wrote.

#io.BufferedRWPair test
from io import BufferedRWPair

# Samples of instrumentation data to be stored in RAM
test0 = {'Wed Aug  1 16:48:51 2012': ['20.0000', '0.0000', '13.5', '75.62', '8190',
                    '1640', '240', '-13', '79.40']}
test1 = {'Wed Aug  1 17:06:48 2012': ['20.0000', '0.0000', '13.5', '75.62', '8190',
             '1640', '240', '-13', '79.40']}

# Attempt to create a RAM-resident object into which to read the data.
data = BufferedRWPair(' ', ' ', buffer_size=1024)

data.write(test0)
data.write(test1)

print data.getvalue()

data.close()

There are a couple of issues here (maybe more!):

-> 'data' is a variable name that picks up a construct (outside of Python) that I'm trying to assemble -- which is an array-like structure that should hold sequential records with each record containing several process data measurements, prefaced by a timestamp that can serve as a key for retrieval. I offered this as background to my design intent, in case the code was too vague to reflect my true questions.

-> This code does not work, because the 'data' object is not being created. I'm just trying to open an empty buffer, to be filled later, but Python is looking for two objects, one readable, one writeable, which are not present in my code. Because of this, I'm not sure I'm even using the right construct, which leads to these questions:

Is io.BufferedRWPair the best way to deal with this data? I've tried StringIO, since I'm on Python 2.7.2, but no luck. I like the idea of a record with a timestamp key, hence my choice of the dict structure, but I'd sure look at alternatives. Are there other io classes I should look at instead?
One alternative I've looked at is the DataFrame construct which is defined in the NumPy/ SciPy/ Pandas world. It looks interesting, but there seems like a lot of additional modules required, so I've shied away from that. I have no experience with any of those modules -- Should I be looking at these more complex modules to get what I need?

I'd welcome any suggestions or feedback, folks... Thanks for checking out this question!

I don't understand what problem you are trying to solve. What is "memory-resident" supposed to mean? All objects are kept in memory, and all objects can be operated on asynchronously, including the dictionaries `test0` and `test1`. They also can be written to a file in various ways. I don't find any clue in the question as to why a standard Python dictionary won't do the trick for you. — Sven Marnach, Aug 04 '12 at 11:20
@SvenMarnach: Hi Sven, Thanks for your interest and question. I used the phrase "memory resident" to differentiate this from most data acq applications which take in data, do a little processing on it (clean up, type and format revision, etc.), write the data to a hard-drive file and then repeat for the next data collection cycle. In my case, instead of writing to disk, I want to write to a memory location, so all the data will be in a memory file, not a hard-disk based file. (Continued in next comment) — Red Spanner, Aug 04 '12 at 14:56
However, I want to keep the memory allocated to this file location to a fixed size so the application doesn't explode and crash. As new data is added, I would either: write out (to disk) or dump the oldest data. Other applications will have access to this structure in memory, for analysis, display, etc., and I'm hoping to be able to use the dictionary key (a timestamp) as a search key. — Red Spanner, Aug 04 '12 at 14:56

score 3 · Accepted Answer · answered Aug 03 '12 at 23:33

3

If I understand what you are asking, using an in-memory sqlite database might be the way to go. Sqlite allows you to create a fully functioning SQL database entirly in memory. Instead of reads and writes you would do selects and inserts.

answered Aug 03 '12 at 23:33

Bryan Oakley

370,779
53
539
685

Hmm, thanks, Bryan -- Interesting idea, and one that wasn't even on my radar screen. I'll go and take a look at the module docs for sqlite. – Red Spanner Aug 04 '12 at 00:00
I've looked more closely at this, Bryan and I'm moving forward with it, it's the easiest route to get through the 'demo' phase my project is in. Thanks! – Red Spanner Aug 05 '12 at 22:55

Sven Marnach · Answer 2 · 2012-08-06T10:26:40.020

0

Writing a mechanism to hold data in memory while it fits and only write it to a file if necessary is redundant – the operating system does this for you anyway. If you use a normal file and access it from the different parts of your application, the operating system will keep the file contents in the disk cache as long as enough memory is available.

If you want to have access to the file by memory addresses, you can memory-map it using the mmap module. However, my impression is that all you need is a standard database, or one of the simpler alternatives offered by the Python standard library, such as the shelve any anydbm modules.

Based on your comments, also check out key-value stores like Redis and memcached.

edited Aug 06 '12 at 10:26

answered Aug 05 '12 at 16:55

Sven Marnach

574,206
118
941
841

Hi Sven, Thanks for this. I'd like to keep the data resident in memory for several (planned/ future) purposes: real-time display in a web interface, real-time data analysis (trending, etc.) and potentially, process control for a sampling system interfaced to the web server. It's unclear to me whether I can do these things based solely on the O/S functions, though I'm no expert in this area (Hence my question). @BryanOakley suggested sqlite, which I'll check out, and I will look at shelve and anydbm as well. Thanks again! – Red Spanner Aug 05 '12 at 18:01
@RedSpanner: Maybe I was not specific enough. The point of this answer is: *You* do not decide what is kept in memory anyway, regardless how you do it. The OS decides what is kept in memory. In can swap out any memory pages to disk at any time, and it will cache file contents in memory. The reason *why* you want to keep things in memory does not matter. And modern OSes usually do a good job in deciding what to keep in memory. (I'll add a few further options to my answer.) – Sven Marnach Aug 06 '12 at 10:21
Hi Sven, I'm actually testing `sqlite3` module, with `':memory:'` option, as it seems the simplest way to accomplish this for the demo I am working on. I'll wait for your options with interest, though. – Red Spanner Aug 06 '12 at 22:12

How can I use the Python io module to build a memory-resident data structure?

2 Answers2