Questions tagged [bzip2]

bzip2 is a Unix command used for compression and decompression of files. The main advantage of bzip2 is that it has a high compression ratio with reasonable speed.

bzip2 one of the most widely used free compression programs for the terminal.

It typically compresses files to within 10% to 15% of the best available techniques, whilst being around twice as fast at compression and six times faster at decompression.

The current version is 1.0.6, released 20 Sept 2010.

327 questions
0
votes
1 answer

Why can't I seem to read an entire compressed file from a URL stream?

I'm trying to parse Wiktionary dumps on the fly, directly from the URL, in Java. The Wiki dumps are distributed as compressed BZIP2 files, and I am using the following approach to attempt to parse them: String fileURL =…
Nickersoft
  • 684
  • 1
  • 12
  • 30
0
votes
1 answer

How i can detect the overflow of Gzip and Bzip2 files during writing?

I need to determine that the file exceeds the specified size while it is being written. When i reach the specified size, i must stop writing and throw a exception. Example for a normal file: $handle = fopen($filename, 'wb'); while (true) { //…
ghost404
  • 299
  • 2
  • 16
0
votes
1 answer

decompress and compress to another format on the fly

I have a big gzip file that I need to change it to bzip2. Obvious way is to 1) decompress the file in memory, 2) write it on disk, 3) read the file again and compress it to bzip2 and write into disk. Now I'm wondering if it's possible to avoid the…
MehrdadAP
  • 417
  • 4
  • 11
0
votes
1 answer

Google dataflow only partly uncompressing files compressed with pbzip2

seq 1 1000000 > testfile bzip2 -kz9 testfile mv testfile.bz2 testfile-bzip2.bz2 pbzip2 -kzb9 testfile mv testfile.bz2 testfile-pbzip2.bz2 gsutil cp testfile gs://[bucket] gsutil cp testfile-bzip2.bz2 gs://[bucket] gsutil cp testfile-pbzip2.bz2…
Fernet
  • 173
  • 1
  • 10
0
votes
2 answers

How to add complete tree structure into a .tar.bz2 file with Perl?

I am looking to compress a lot of data spread across loads of sub-directories into an archive. I cannot simply use built-in tar functions because I need my Perl script to work in a Windows as well as a Linux environment. I have found the…
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
0
votes
0 answers

Unzipping large bzip2 file with google dataflow

I have a bunch of mysql dumps compressed with bzip2 on google cloud storage. I would like to uncompress them. I tried using a pipeline defined like this: p.apply(TextIO .Read .from("gs://bucket/dump.sql.bz2") …
0
votes
1 answer

rsync folders where target folders has the same files, only already compressed

I am at an impass with my knowledge about bash scripting and rsync (over SSH). In my use case there is a local folder with log files in it. Those logfiles are rotated every 24 hours and receive a date-stamp in their filename (eg. logfile.DATE) while…
scheuri
  • 93
  • 1
  • 1
  • 5
0
votes
1 answer

process bz2 file and process using awk

I have a file called "text.bz2" which contains a number of records which i want to process. I have a script which successfully processes all the data in a standard text file and outputs the results to a different "results.txt" file, but the command…
villaman
  • 65
  • 3
0
votes
3 answers

python3.6 - can't install modules '_sqlite3' & '_bz2'

I'm using a Fedora distribution and working with python 3.6. While importing both nltk and sklearn it says I'm missing said 2 modules (respectively). I tried fixing it by first downloading these modules using: sudo yum install sqlite-devel and sudo…
user3765713
  • 133
  • 2
  • 13
0
votes
1 answer

Help compiling seek-bzip2 on Windows

I cannot get James Taylor's excellent little "seek-bzip2" to compile under Windows? It can index a bzip2 archive and then use that index to provide random access to the individual blocks of the archive. It's written in C and requires 64 bit long…
hippietrail
  • 15,848
  • 18
  • 99
  • 158
0
votes
3 answers

Does any mainstream compression algorithm natively support streaming data

Does any mainstream compression algorithm, for example either snappy, zlib or bzip natively support streaming data across the network? For example if I have to send a compressed payload, then will I have to manually prepend the size of the payload…
Curious
  • 20,870
  • 8
  • 61
  • 146
0
votes
1 answer

Python 2 Zip bzip2 support

Is there any way to extract ZIP files that use bzip2 compression in Python 2? The native ZipFile module doesn't support bzip compression (12). Any way around it or an alternative method?
Xyand
  • 4,470
  • 4
  • 36
  • 63
0
votes
1 answer

In linux how to archive and compress multiple files into one and remove source files?

I have below files in a directory. file001 file002 . . file009 I need to compress them into one and remove original/source files (file001 .. file009) so I can free up some disk space. This is what I did: Archived all files into one using below…
Hasan Rumman
  • 577
  • 1
  • 6
  • 16
0
votes
1 answer

Ingest bzip2 files with Apache Flume

I need to ingest compressed files in bzip2. Is it possible using flume? I have tried it with spooling directory and BlobDeserializer, but it is unreadable at the sink. Thanks in advance!
alisson
  • 1
  • 3
0
votes
1 answer

unzipping bzip file using bash

I am trying to unzip bzip file using bash this way tmp1 = #(bzcat all.tbz) echo tmp1 | tar x But this fails with tar: Unrecognized archive format tar: Error exit delayed from previous errors. But if I do this bzcat all.tbz | tar x and that…
Abdul Rahman
  • 1,294
  • 22
  • 41