0

Ok, I saw similar questions but not exactly the same. And I can't figure out what is really wrong with this python code:

import tarfile

tar_file = tarfile.open('something.tgz', mode="r|gz")
txt_file = tar_file.extractfile('inner.txt')
lines = txt_file.readlines()    
txt_file.close()
tar_file.close()

It gives StreamError: seeking backwards is not allowed due to readlines().
But this fact looks strange to me and I try to understand what I miss here.

StarterKit
  • 488
  • 4
  • 15

1 Answers1

2

The problem is with this line:

tar_file = tarfile.open('something.tgz', mode="r|gz")

According to the tarfile.open() docs, the correct mode should be either "r" - Open for reading with transparent compression (recommended) or "r:gz" - Open for reading with gzip compression. Using the pipe | character creates a stream:

Use this variant in combination with e.g. sys.stdin, a socket file object or a tape device. However, such a TarFile object is limited in that it does not allow random access

which is where you ran into problems with readlines() and seek(). When I changed that pipe | to a colon :, your code worked fine.

MattDMo
  • 100,794
  • 21
  • 241
  • 231
  • `txt_file.seek(0)` gives `AttributeError: '_Stream' object has no attribute 'seekable'` – StarterKit Aug 31 '21 at 11:45
  • @StarterKit see the accepted answer in the link in my comment above. Basically, you'll need to convert the `Stream` object into an `io.StringIO` object, then you can use `readlines()` (or `read()` or `seek()` or any other file object method) on it. – MattDMo Aug 31 '21 at 11:50
  • yes, I saw that answer but probably I misunderstood it. I see that they use `tmpfile = BytesIO()` there. But my understanding - it was done to store data from FTP locally. Later they open this `tmpfile` with `tarfile` and code is very similar to mine. So, may you elaborate a bit more about the difference with my code? I can't clearly see it. – StarterKit Aug 31 '21 at 11:56
  • @StarterKit ok, I figured it out. Please see my edited answer. – MattDMo Aug 31 '21 at 13:30
  • Thanks! This was my fault. I use mode rarely in this way and copy-pasted it in a wrong way. Thanks again for explaining my mistake. – StarterKit Aug 31 '21 at 15:16