0

I am using bitarrays in one of my projects to store the bits(reading from a file which has "0"s and "1"s, and time information which is required. File can also have not necessary fields like comments etc.). Now I want to parse the bits. I filter out all the unnecessary things from the file while reading it. So I need a parser which can parse bits(bitarrays).

Currently, I am using parsimonious parser to parse the strings(i. e "0"s and "1"s), but seems like this parser just takes strings as an input. Is this the case for all parsers? Or can I write grammar rules which can match python bitarrays/bitstrings or list or any other non string data-structure?

So if I want to parse bits(bitarrays) what would be the best way to do it?

Example:

I have a string "011000111100011010" in a file. Since "0" in the string is a character, it takes 8 bits in memory. Since it is wastage of memory(because i need only a bit to store 0) i'm planning to store the bits in bitarrays.

Say, I want to match a frame(which is of length 18 bits) and they can be represented as following

frame(18bits) = field1(6bits) field2(2bits) field3(5bits) field4(5bits)

So how can I write a simple grammar rule which can match these bits.

This is how I'm doing this using parsimonious.(Here I consider a "0" as a character)

frame_matcher = field1 field2 field3 field4
field1 = ~"[01]{6}"
field2 = ~"[01]{2}"
field3 = ~"[01]{5}"
field4 = ~"[01]{5}"

This is just an example, in reality the scenarios is much more complex. And even the file size is too large(~1GB). So I am searching for a data-structure which can store bits(not characters) and a parser, which can parse bits(not characters) in python

sophros
  • 14,672
  • 11
  • 46
  • 75
user2109788
  • 1,266
  • 2
  • 12
  • 29
  • 1
    You have a 1GB file which is just the characters `0` and `1`? Seems wasteful...you should pack the bits before storing to reduce file sizes. – nneonneo Jun 10 '14 at 06:19
  • Also, parsimonious is far and away overkill for parsing bitfields. Learn to pack bits, and then use bit shifting to extract bits. – nneonneo Jun 10 '14 at 06:21
  • Your file is binary data or character data? If its character data, you are not reading bytes. – Burhan Khalid Jun 10 '14 at 06:27
  • @nneonneo : I have edited my question. Actually file can have any data but I filter out the useful information from it and parse it. File is the input for the application given by the user. So I will not have control over it. What else can be the input then? – user2109788 Jun 10 '14 at 06:31
  • Why don't you just read all the bits into a bitarray and use slicing then...? – nneonneo Jun 10 '14 at 06:32
  • @BurhanKhalid : Actually file is the normal ".txt" or ".log" file. – user2109788 Jun 10 '14 at 06:33
  • @nneonneo : There are different types of frames(which has different types fields). So i thought grammars are easy to handle the scenario. Also I have succeeded in case of parsing strings. But performance wise its very slower. So I thought of storing these bits as bitarrays and searching for a way to parse similar to parsimonious grammar rules. So is it possible to parse non string data-structures(including bitarrays) in python? – user2109788 Jun 10 '14 at 06:41
  • Grammars may be an OK way to handle the problem, but they are still way overkill IMHO. Writing a parser for bitarrays is not hard, since everything is usually in fixed width (so all you have to do is keep track of where you are in the bitarray). – nneonneo Jun 10 '14 at 06:44
  • What else do you suggest? there can be 100s of grammar rules. and even some of the fields can be of varied length. Can you please give me an idea(with an example would be best) how can I parse bitarrays. May be you can add it as answer to this question. – user2109788 Jun 10 '14 at 06:51

0 Answers0