0

Q:

I know there is not one perfect answer to all of this nonsense; I am hoping for some experienced insight to narrow down the possible flavors, some general strategy to avoid conversion nightmares, and any ideas on reducing my data-storage footprint on the CPU/disk (large string operations are expensive and tedious). I am on restricted hardware, and somewhat new to XML standards. I can read and write it just fine (usually for website), never really as a dataset encapsulation.


I have given this weeks of thought, and I am 92.3% sure that XML files are my ideal storage destination. I am logging various instrumentation readings/analysis, and holding it for months at a time. Although I do have concerns about my data-collection nodes having limited hardware resources (Excessive string operations can get slow, 512kB RAM, 3.2GB flash storage).

I am trying to find a well formed ML with a minimal footprint that can handle RAW numerical datatypes. I do not need fully compliant files, BUT I am looking for a best-fit solution, so lets not deviate too far from proper form

Primary Data Model Factors

(and why I think XML is a better fit that Packed binary, FLAT TEXT, or even CSV)

  • Up to 8 different datapoints (different measurements, brands, and sensor types)
  • various raw datatypes (REAL32, DINT, DWORD, BYTE, STRING(arbitrarily long)
  • datasets need to be able to keep absolute timestamps within each file (I have a directory full of 100's of XML's that will eventually merge)
  • datapoint configuration/quantity could change, so I need to be able to note alterations to the schema with minimal verbosity/confusion.

Performance Constraints/Considerations

  • I should normally only write out the XML from the embedded platform, so readability is not paramount, although if I do need to handle any kind of inquiry, tossing and parsing 3.0GB of text is not going to be fun even at its very cleanest.
    • I believe that intermittent DATE-TIME nodes will help me index such an inquiry
  • Compressing data excessively can actually become a problem at export time, because those become yet more calculations to unzip my laziness.
  • Excessively verbose XML only gives me 111 days of storage. I would like to get that up to 180 days or longer. So I do need to condense text better.
  • There are 3 potential targets once the data is offloaded. I don't want to run into conversion bottlenecks/mistakes by over-complicating.
    • Microsoft Excel (he doesn't have to understand it perfectly, but we don't want to spend hours manually importing non-compliant schema types/maps into a 2D grid.
    • RRD Backend Server (I will be able to run any conversions needed, but hopefully I am already close to what RRD wants
    • Some cute Javascript/Android tools. Although I expect these to perform custom datatype handling, well-formed XML will make retrieval and parsing simpler during development.
user2097818
  • 1,821
  • 3
  • 16
  • 34
  • What is your question? – Michael Kay Dec 10 '14 at 08:31
  • @MichaelKay Highlighted and moved up a little. – user2097818 Dec 10 '14 at 09:46
  • 1
    I'm afraid this kind of question doesn't usually work well on StackOverflow. It needs a 2-day consultancy workshop, not the 3 minutes that SO questions usually get. – Michael Kay Dec 10 '14 at 14:24
  • @MichaelKay I do understand your point, but XML is one of those things I avoided every chance I got because I thought it was silly, and now it seems that I have no other choice. I am weary of waking the *XML Gods*. How about narrowing down [this list](http://en.wikipedia.org/wiki/List_of_XML_schemas) or adding one to it that I don't know about. That will help. I know someone has had this issue before, and I just need a good starting point before I take the plunge. – user2097818 Dec 10 '14 at 16:49
  • I'm sorry, I think it would be unprofessional to give guidance here without a careful study of your project requirements, and I don't have time for that. – Michael Kay Dec 11 '14 at 11:03

1 Answers1

0

Did you consider storing your XML files in an XML database such as eXist?