1

I am working on a simulation engine using Python where I collect a lot of metrics. The simulation runs at a high speed and generates around 100K events/second (I can do some processing by consolidating these events on a per second basis). I am looking for a mechanism to record these metrics as a time series.

My requirements are:

  1. I would like to have this logging mechanism in the same process as the simulation as opposed to an external process such as Graphite

  2. The mechanism must be able to handle 100K events/second without slowing down the simulation.

  3. I would like to store data as follows: Every metric related data should be stored with 1 second granularity for 60 minutes, 1 minute granularity for 1 day, 5 minute granularity for two days, 1 hour granularity for 6 months and 1 day granularity for 3 years of duration. I would like this mechanism to handle the consolidation of data as per the ranges specified.

  4. Ideally, I want to maintain one file that holds the metrics information for one simulation run. For another run of the simulation a separate file would have to be created.

  5. It would be nice to have a well-tested library/module that is readily available :)

BTW, I took a cursory look at RRDTool but from what I understand it seems like the Python library is a thin wrapper around the RRDTool binary. I'm looking for a tighter integration if possible.

TIA

Prashanth Ellina
  • 374
  • 3
  • 15
  • Do you have any attempted code or previous things you have looked into that you can show us? – Ryan Saxe May 06 '13 at 22:07
  • I've used http://graphite.wikidot.com/ in the past. But as I mentioned in my question text this is an external process to my simulation and I would prefer otherwise. Right now I'm reading the documentation for RRDTool - http://oss.oetiker.ch/rrdtool/doc/rrdtool.en.html. I have not written any code for this at the moment as I am still in the process of figuring which way to go. – Prashanth Ellina May 06 '13 at 22:10
  • http://oss.oetiker.ch/rrdtool/prog/rrdpython.en.html seems to be a prospective solution. Can anyone with experience in using this comment on its appropriateness for my use-case? – Prashanth Ellina May 06 '13 at 22:17
  • https://pypi.python.org/pypi/PyRRD - This looks like a more Pythonic way to access RRDTool's functionality. – Prashanth Ellina May 06 '13 at 22:42
  • Unfortunately I don't know the documentation and cannot help you, but this is not the kind of thing you ask on stack overflow. You should ask questions about why your code is not working, or how you could add something to do something, but not I want to do this, please point me in the right direction or tell me what modules to look into. You need to figure that one out on your own. – Ryan Saxe May 06 '13 at 23:08
  • I can understand if this is not the kind of question that is appropriate for the focus area of Stackoverflow. If that is the case I will gladly post elsewhere but I find it hard to agree with your other statement that I can't ask to be pointed in the right direction. What is wrong with asking people on the internet about their prior experience which can help me save significant amount of time by avoiding repeating the same experiments? – Prashanth Ellina May 06 '13 at 23:17
  • The reason I am telling you is not to deprive you, it is because it is in the [faq](http://stackoverflow.com/faq), and your question will likely be closed. If you are to ask to be pointed in the right direction, it should be accompanied with coding attempts and examples. That way it goes under the "a specific programming problem" section – Ryan Saxe May 06 '13 at 23:21
  • I did read the FAQ and I see this "practical, answerable problems that are unique to the programming profession". Also, it says having "source code" as a part of the question is preferable but not mandatory as long the question pertains to the realm of programming. As I see it, my question is answerable, related to programming and it is a practical problem that I am trying to solve. I don't see why this is not appropriate. – Prashanth Ellina May 06 '13 at 23:26
  • sorry @RyanSaxe but i completely disagree with you and i think these types of questions spawn some really great answers. programming is not just about a `foreach` but also the initial design/concept of it all. – au_stan May 08 '13 at 11:48
  • @PrashanthEllina what exactly are you logging? can you give an example? – au_stan May 08 '13 at 13:16
  • I am simulating an Artificial Life environment where virtual organisms live and die. I keep track of metrics like number of births, number of deaths per unit time (as per sim). I have about 15 such data points that need to be logged. Birth, Death and other events happen at the rate of 100K/s. Currently I am aggregating the stats once per sim tick (approx 2 seconds). Example of the values looks like this {'tick': 1, 'ts': , 'births': xxxx, 'deaths': xxx, 'dpoint3': xxx, 'dpoint3': xxx, .... ,'dpointn': xxx} – Prashanth Ellina May 08 '13 at 22:27

1 Answers1

3

The functionality provided by RRDTool fits my requirement. Initially I found a Python library https://pypi.python.org/pypi/python-rrdtool/ and misunderstood the nature of integration. I thought it was executing the binary of RRDTool as a separate process but the documentation says that this is a proper Python accessible wrapper that invokes the functionality in the same process space.

Later on I found this (https://pypi.python.org/pypi/PyRRD) Python library that wraps RRDTool functionality in a more pythonic OOPS kind of fashion that I found comfortable working with. The documentation available on the link page was good so I faced no roadblocks in using it.

This link (http://www.vandenbogaerdt.nl/rrdtool/tutorial/rrdcreate.php) was helpful in figuring out how to configure the RRD database during creation.

Prashanth Ellina
  • 374
  • 3
  • 15
  • That is totally acceptable to accept your own answer. After 6 years did you move on to some other method or did you stick with this approach? – Peter Moore Mar 06 '20 at 16:05
  • @PeterMoore I am no longer working on the project where I had this requirement. However, I do remember this approach matching the needs of the project. Had no issues. – Prashanth Ellina Mar 16 '20 at 22:52