2

I'd like to reference the following post as well, and mention that I'm familiar with BioPython.

How to obtain random access of a gzip compressed file

I'm familiar with the Bio.bgzf's potential for indexes and random reads. I'm building a library that uses the module to build an index against the blocks that contain data that is relevant to my interests. The technology is very interesting but I'm struggling to understand the pace of development or limitations of what Bio.bgzf or even the bgzf standard are capable of.

Can Bio.bgzf overwrite a specific line in the file, just as it can read from the virtual offset to the end of the line? If it could, would the new data necessarily need to be exactly the same number of bits?

After using make_virtual_offset() to acquire a position in the .bgzf file for a line that I'd like to overwrite, I'm looking for a method like filehandle.writeline() to replace the line in the block with some new text. If that's not possible, then is it possible to get the coordinates for the entire block and then rewrite that. And if not, it could be said that bgzf index files are sufficient for reading only. Is this correct?

  • 1
    Just a few thoughts - I'm not any expert: If you don't wan't the easy way (recreating whole file), you would have to alter the whole block which contains the line of interest - stream to file something like this (read binary until block start, decompress the block, modify, compress the modified block, read rest, create index for the new file - optional). I would say that bgzf are very usefull when you need to access small portions of lot of different data - without knowing in which order you'll need the access. – Marek Schwarz Feb 11 '20 at 08:05

0 Answers0