Questions tagged [bz2]

For issues relating to bz2 which is the file extension of compressed files by bzip2.

Files compressed with bzip2 are frequently given the bz2 extension. bunzip2 should be used to decompress these files.

tar supports bzip2 with the -j option, which can be used to extract or create archives that are also compressed with bzip2.

Also see tag bzip2

106 questions
1
vote
2 answers

How do I remove bytestrings left over from decompression from a string?

I have a bunch of strings which are sentences that look something like this: Having two illnesses at the same time is known as \xe2\x80\x9ccomorbidity\xe2\x80\x9d and it can make treating each disorder more difficult. I encoded the original string…
Peter Charland
  • 409
  • 6
  • 18
1
vote
0 answers

Pandas: Read random sample of data using read_json

I would like to read in a random sample of a large .bz2 file. Similarly to how you would read in a sample of csv like this: import pandas import random n = 1000000 #number of records in file s = 10000 #desired sample size filename =…
HarriS
  • 605
  • 1
  • 6
  • 19
1
vote
1 answer

Anaconda (Jupyter) don't see previously installed package from the file <.tar.bz2>

I try to work with Anaconda3-2019.07. I've installed mxnet library from file <.tar.bz2> in offline. I did so because of server which I use hasn't internet connection. For this I entered: conda install --offline mxnet-1.2.1-h8cc8929_0.tar.bz2 The…
1
vote
2 answers

Limit on bz2 file decompression using python?

I have numerous files that are compressed in the bz2 format and I am trying to uncompress them in a temporary directory using python to then analyze. There are hundreds of thousands of files so manually decompressing the files isn't feasible so I…
BenT
  • 3,172
  • 3
  • 18
  • 38
1
vote
1 answer

python: extracting a .bz2 compressed file from a torrent file

I have a .torrent file that contains a .bz2 file. I am sure that such a file is actually in the .torrent because I extracted the .bz2 with utorrent. How can I do the same thing in python instead of using utorrent? I have seen a lot of libraries for…
GRquanti
  • 527
  • 8
  • 23
1
vote
3 answers

How to get the time needed for decompressing large bz2 files?

I need to process large bz2 files (~6G) using Python, by decompressing it line-by-line, using BZ2File.readline(). The problem is that I want to know how much time is needed for processing the whole file. I did a lot searches, tried to get the actual…
1
vote
1 answer

EOFError: compressed file ended before the logical end-of-stream was detected in decompressing bz2 file

I get this error when I try to decompress wikipedia dump to use its .xml file. How can I solve it? filepath='/Data/nlp/ESA/Wiki-ESA-master' file_name='enwiki-latest-pages-articles.xml.bz2' zipfile = bz2.BZ2File(file_name) # open the…
parvaneh
  • 490
  • 2
  • 6
  • 16
1
vote
1 answer

Find invalid bz2 file preferable using C/C++

I have around 200 thousand bz2 files in which only one 1 valid. The size of each bz2 file is less than 200 bytes. I need to find the valid one. The command line bz2 utility is taking too much time. Is there minimal check using file bytes by which I…
Shashwat Kumar
  • 5,159
  • 2
  • 30
  • 66
1
vote
1 answer

Google Dataflow creates only one worker for large .bz2 file

I am trying to process the Wikidata json dump using Cloud Dataflow. I have downloaded the file from https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2 and hosted it into a GS bucket. It's a large (50G) .bz2 file containing a list…
1
vote
4 answers

compress multiple files into a bz2 file in python

I need to compress multiple files into one bz2 file in python. I'm trying to find a way but I can't can find an answer. Is it possible?
Leandro
  • 870
  • 2
  • 13
  • 27
1
vote
1 answer

How to download bzip2 sources for linux?

I used to download http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz But now http://www.bzip.org/ does not exists anymore.
karelv
  • 756
  • 9
  • 20
1
vote
1 answer

extracting a .ppm.bz2 from a custom path to a custom path

as the title says, I have several folders, several .ppm.bz2 files and I want to extract them exactly where they are using python. Directory structure image I am traversing in the folders as this: import tarfile import os path =…
user10063119
1
vote
1 answer

Faster repetitive uses of bz2.BZ2File for pickling

I'm pickling multiple objects repeatedly, but not consecutively. But as it turned out, pickled output files were too large (about 256MB each). So I tried bz2.BZ2File instead of open, and each file became 1.3MB. (Yeah, wow.) The problem is that it…
noname
  • 343
  • 4
  • 14
1
vote
1 answer

Decompress bz2 files in 30,000 subfolders with python os walk?

I've got 30,000 folders and each folder contains 5 bz2 files of json data. I'm trying to use os.walk() to loop through the file path and decompress each compressed file and save in the original directory. import os import bz2 path =…
tomoc4
  • 337
  • 2
  • 10
  • 29
1
vote
2 answers

How can I get the 10 first lines of all my compressed files?

I have a bunch of M files, from which I want to extract the first N lines (from each). My files are compressed in BZ2. Otherwise, doing head -10 * would be enough. Ex: Assume I want to extract the 2 first lines from all my files (A.txt, B.txt, C.txt…
belka
  • 1,480
  • 1
  • 18
  • 31