Questions tagged [bz2]

For issues relating to bz2 which is the file extension of compressed files by bzip2.

Files compressed with bzip2 are frequently given the bz2 extension. bunzip2 should be used to decompress these files.

tar supports bzip2 with the -j option, which can be used to extract or create archives that are also compressed with bzip2.

Also see tag bzip2

106 questions
2
votes
0 answers

python: How do bz2 "incremental" and "one-shot" (de)compression differ from the "regular" method?

I have a series of directories, each about 38 MB on disk, that I need to pickle no a Python 3.6 Windows 10 system. When I ran the following the code, the resulting .pickle files were huge, ~158 MB each: from six.moves import cPickle as pickle with…
Karl Baker
  • 903
  • 12
  • 27
2
votes
0 answers

Read simple/bz2-compressed-file(line by line) by detecting it is compressed or not (size of file is large)

I wrote a code to read simple-text/bz2-compressed-file. I used magic-characters of bz2 file to detect the file is compressed or not NOTE "user may or may not provide file with proper extension" my code #include #include…
mr_beginner
  • 145
  • 1
  • 9
2
votes
1 answer

How can I increase parallelism with loading large XML file with spark-xml?

I have a modest-sized xml file (200MB, bz2) that I am loading using spark-xml on an AWS emr cluster with 1 master and two core nodes, each with 8cpus and 32GB RAM. import org.apache.spark.sql.SQLContext import com.databricks.spark.xml._ val…
seandavi
  • 2,818
  • 4
  • 25
  • 52
2
votes
0 answers

Python3.6.3, ModuleNotFoundError: No module named '_bz2'

In Linux, CentOS I download the bzip tar file(bzip2-1.0.6.tar.gz) and make && make install Then I recompile Python-3.6.3 ./configure --prefix=/home/gt/Py36 make && make install Then I import bz2 in /home/gt/Py36/bin/python3 and…
DunkOnly
  • 1,682
  • 4
  • 17
  • 39
2
votes
1 answer

Python3: how to read the txt.bz2 file

There is text tile which compressed by bz2 file. The data in the text file like the following. 1 x3, x32, f5 0 f4, g6, h7, j9 ............. I know how to load the text file by the following code rf = open('small.txt', 'r') lines =…
tktktk0711
  • 1,656
  • 7
  • 32
  • 59
2
votes
0 answers

Efficient ingestion of large bz2 files in Spark

Is there a way to efficiently ingest large (e.g. 50 GB) bz2 files in Spark? I'm using Spark 1.6.1, 8 executors with 30 GB of RAM each. Initially, each executor had 4 cores. However, opening bz2 files with textFile() throws ArrayOutOfBoundsException.…
Marco
  • 180
  • 1
  • 8
2
votes
1 answer

Extracting bz2 file with single file in memory

I have a csv file compressed into a bz2 file that I'm trying to load from a website, decompress, and write to a local csv file by # Get zip file from website archive = StringIO() url_data = urllib2.urlopen(url) archive.write(url_data.read()) #…
Daniel Underwood
  • 2,191
  • 2
  • 22
  • 48
1
vote
1 answer

TypeError: a bytes-like object is required, not 'str' when trying to write to a csv.writer that uses a bz2.BZ2File object

Background: I need to write a CSV file that I compress before putting to disk as I'm running about 96 processes simultaneously on an SMP and they otherwise fill up the tiny hard drive space I have before I can offload them elsewhere (no, it's not my…
Gabe
  • 131
  • 1
  • 13
1
vote
1 answer

Getting error bz2 module requires libbz2 >= 1.0.0 while compiling php 8.1.13

I am trying to compile PHP with ./configure --with-bz2=/path_to_bzip2/bzip2/1.0.6 But when the build reaches bz2 it gives out below error .. checking for BZip2 support... yes checking for BZ2_bzerror in -lbz2... no configure:…
Lokesh Purohit
  • 523
  • 6
  • 9
1
vote
1 answer

Read json from file.json.bz2 quickly

I'm trying to open a bz2 file and read the json file contained inside. My current implementation looks like with bz2.open(bz2_file_path, 'rb') as f: json_content = f.read() json_df = pd.read_json(json_content.decode('utf-8'), lines = True) I…
baked goods
  • 237
  • 2
  • 10
1
vote
0 answers

Go: How to read in MRT (.bz2) file as byte and parse data

I am trying to read in an mrt (with .bz2 file extension) from archive.routeviews.org namely file - http://archive.routeviews.org/route-views.chile/bgpdata/2022.05/UPDATES/updates.20220501.0000.bz2. I have found some code online that parses it using…
e110
  • 23
  • 1
  • 5
1
vote
0 answers

how to solve this error: "'TreeEnsemble' object has no attribute 'model_output'"

''' features = [gender, SeniorCitizen, Partner, Dependents, Tenure, PhoneService, MultipleLines, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies, PaperlessBilling, MonthlyCharges,…
Jasmine
  • 11
  • 3
1
vote
2 answers

Download bz2, Read compress files in memory (avoid memory overflow)

As title says, I'm downloading a bz2 file which has a folder inside and a lot of text files... My first version was decompressing in memory, but Although it is only 90mbs when you uncomrpess it, it has 60 files of 750mb each.... Computer goes bum!…
1
vote
0 answers

Python bz2 readlines slow in byte-mode

I have a bz2-compressed log file with a lot of lines. Every line has to undergo a small analysis which is of no importance here. I started by reading the lines in text mode like: import bz2 path = 'content.log.bz2' def method_1(path): with…
Durtal
  • 1,063
  • 3
  • 11
1
vote
0 answers

Python3.7.4 - Error while importing pandas libraries

I had installed Python manually by using below comamnd- #cd Python-3.7.4 #./configure --enable-optimizations #make altinstall and later NumPy and Pandas libraries manually as below - #python3.7 setup.py install ERROR - >>> import pandas as…
vivekdesai
  • 33
  • 2