I would like to use re
module with streams, but not necessarily file streams, at minimal development cost.
For file streams, there's mmap
module that is able to impersonate a string and as such can be used freely with re
.
Now I wonder how mmap
manages to craft an object that re
can further reuse. If I just pass whatever, re
protect itself against usage of too incompatible objects with TypeError: expected string or bytes-like object
. So I thought I'd create a class that derives from string
or bytes
and override a few methods such as __getitem__
etc. (this intuitively fits the duck typing philosophy of Python), and make them interact with my original stream. However, this doesn't seem to work at all - my overrides are completely ignored.
Is it possible to create such a "lazy" string
in pure Python, without C extensions? If so, how?
A bit of background to disregard alternative solutions:
- Can't use
mmap
(the stream contents are not a file) - Can't dump the whole thing to the HDD (too slow)
- Can't load the whole thing to the memory (too large)
- Can seek, know the size and compute the content at runtime
Example code that demonstrates bytes
resistance to modification:
class FancyWrapper(bytes):
def __init__(self, base_str):
pass #super() isn't called and yet the code below finds abc, aaa and bbb
print(re.findall(b'[abc]{3}', FancyWrapper(b'abc aaa bbb def')))