0

I want to read xz file effectively, so i think uncompress is not a good choice.

Is there any methods i can use to read xzfile without decompressing in c++?

I know zlib is a great tool to read gz file, but it cant be used in xz file.

I found i can vim xz file, it shows good. but when i use cpp ifstream getline, it comes out mess code. Can anyone explain on this?

xyhuang
  • 414
  • 3
  • 11
  • 1
    Maybe you have slight misunderstanding about the purpose of compression? Of course you need to decompress the data in one way or another before you can read the actual information. – Lukas-T Sep 15 '20 at 09:34
  • @churill Thanks, I think compress file is to save space. it would be great if it can be read directly. – xyhuang Sep 15 '20 at 09:37
  • Don't you see the conflict between "compressed to save space" and "read directly"? Maybe study the technical details of compression from the implementation point of view. I found the Lempel-Ziv algorithm very easy to understand. Read up on it. Then I expect you do see the "direct reading" problem. – Yunnosch Sep 15 '20 at 09:40
  • 1
    To put it differently, your impressoin that you can "directly read" a compressed file via vim is wrong and misleading. You miss that vim DOES indeed decompress the file. It just does not necessarily tell you. – Yunnosch Sep 15 '20 at 09:41
  • Yes, it's to save space. But all the information is encoded in such a way that it takes less space. Notice the word encoded here. Only through decompression you can restore the original information. To give you an example: You order a chair and get huge package. You can just sit on the package (that's what you are trying to do) or you can unpack the package and build a real chair from it's contents. The latter seems more useful, right? – Lukas-T Sep 15 '20 at 09:42
  • Just an anecdote for fun: A patent office has accepted a patent on an algorithm which is able to compress **any** kind of input by at least 1 byte. If you understand the topic you should be laughing now. (For those interested, the reasoning behind it is "Well, I will just store part of it elsewhere...) – Yunnosch Sep 15 '20 at 09:44
  • @churill yes, i see. i think zlib api function, such as gzread() mislead me. thank you – xyhuang Sep 15 '20 at 09:45
  • @Yunnosch thank you. i think i can get it. – xyhuang Sep 15 '20 at 09:47
  • There is no much gain from manually compressing a text file, the operating system file system driver does that efficiently and transparently nowadays. – Christopher Yeleighton Sep 15 '20 at 10:14

2 Answers2

1

Compression is an invertible process of turning one sequence of bytes into another, hopefully shorter1. Decompression is the inverse of that process. So of course if you have an already compressed sequence of bytes then you have to decompress it in order to recover the content. There's no way around it and thus a performance hit is unavoidable. So the answer to

Is there any methods i can use to read xzfile without decompressing in c++?

is simply "no". C++ or not, doesn't matter.

As for

I found i can vim xz file, it shows good.

Yes, because vim decompresses the file (presumably in memory) under the hood for you. It just doesn't tell you about it.


1 fun fact: mathematics tells us that for every compression algorithm there exists an input such that the algorithm actually generates larger output. Compression algorithms are based on the fact that what we compress has some nice patterns inside, e.g. words. That's also why applying compression multiple times just doesn't (and will never) work.

freakish
  • 54,167
  • 9
  • 132
  • 169
0

xz is not a text file, it contains bytes, not characters, so you cannot use a standard input stream on it and you are left with fread. However, xz is open source, so if your tool is open source too, you can just grab their code and adapt it to your needs.