Questions tagged [bz2]

For issues relating to bz2 which is the file extension of compressed files by bzip2.

Files compressed with bzip2 are frequently given the bz2 extension. bunzip2 should be used to decompress these files.

tar supports bzip2 with the -j option, which can be used to extract or create archives that are also compressed with bzip2.

Also see tag bzip2

106 questions
0
votes
1 answer

How to split big 30GB bz2 file into multiple small bz2 files and add a header to each

I have large number of bz2 formatted files (30GB each) without any header. I can split them easily in 500M in each size with the following pileline. bzcat logging.abc_gps.bz2 | pv | split -b 500M -d -a 4 --filter='bzip > $FILE.csv.bz2' -…
MALAM
  • 37
  • 7
0
votes
1 answer

Connect to bz2 sqlite database in Python

I have a bz2 file (I have never worked with such files). When I manually unzip it, I see it's a sqlite db with several tables in it, but I don't know how to connect to it all from python without having to unzip it manually (I have many dbs so it has…
dc2
  • 157
  • 1
  • 1
  • 6
0
votes
2 answers

Is there a way to skip first x lines of a bz2 file in Python without calling next()?

I'm trying to read the latest Wikidata dump while skipping the first, say, 100 lines. Is there a better way to do this than calling next() repeatedly? WIKIDATA_JSON_DUMP = bz2.open('latest-all.json.bz2', 'rt') for n in range(100): …
zadrozny
  • 1,631
  • 3
  • 22
  • 27
0
votes
1 answer

Python bz2 returns EOFerror before the whole file has been read

I am trying to lazily load items from a compressed file that resides in Zenodo. My goal is to iteratively yield the items without storing the file in my computer. My problem is that an EOFerror occurs right after the first non-empty line is read.…
Carlos
  • 51
  • 7
0
votes
1 answer

How to *properly* compress and decompress a text file using bz2 and python

So I've had this system that scrapes and compresses files for a while now using bz2 compression. The way it does so is using the following block of code I found on SO a few months back: Let's assume for the purposes of this post the filename is…
JoeVictor
  • 1,806
  • 1
  • 17
  • 38
0
votes
0 answers

Issue with Pycharm Environment Pandas/BZ2

In trying to import and run pandas in Pycharm I get the following error: <<>>\<>>\lib\site-packages\numpy\_distributor_init.py:32: UserWarning: loaded more than 1 DLL from…
Mark McGown
  • 975
  • 1
  • 10
  • 26
0
votes
1 answer

Python Multiprocessing with limited resources

Problem Statement I'm currently building an exchange scraper with three tasks, each running on its own process: #1: Receive a live webfeed: very fast data coming in, immediately put in a multiprocessing Queue and continue. #2: Consume queue data…
JoeVictor
  • 1,806
  • 1
  • 17
  • 38
0
votes
2 answers

Speed up reading in a compressed bz2 file ('rb' mode)

I have a BZ2 file of more than 10GB. I'd like to read it without decompressing it into a temporary file (it would be more than 50GB). With this method: import bz2, time t0 = time.time() time.sleep(0.001) # to avoid / by 0 with…
Basj
  • 41,386
  • 99
  • 383
  • 673
0
votes
0 answers

Python: Ignore EOF in XML file

I'm currently working on a project that involves getting article-titles from the Wikipedia dump. The downloadable file is in .bz2 format and contains an XML file that would be about 80GB in size if I were to unpack it. I can open and read the first…
wustus
  • 17
  • 5
0
votes
1 answer

I cant locate the exe file in the bz2 I have downloaded

Essentially I need to download a bz2 file, save it, and run the exe file within a program I am using (pano2vr). Issue is - I can't find the exe file in the bz2 download. Here are the instructions: https://ggnome.com/doc/glossary_ffmpeg/ Here is the…
Casey
  • 21
  • 1
  • 6
0
votes
1 answer

How to extract parts of a bzipped PostgreSQL dump

I have a PostgreSQL plain format dump and need only two or three tables' data. The dump is in gz2 format. bzcat dump.sql.gz | perl -lne 'print if /^COPY tablename/../^\\\.$/' > something.sql not working. also tried bzip2 -dc dump.sql.gz|perl -lne…
S_M
  • 31
  • 4
0
votes
1 answer

usr/local/lib/libbz2.a: could not read symbols: Bad value

While installing python, I was getting following error: usr/local/lib/libbz2.a: could not read symbols: Bad value /usr/bin/ld: /usr/local/lib/libbz2.a(bzlib.o): relocation R_X86_64_32S against `.text’ can not be used when making a shared object;…
Bhanuday Sharma
  • 323
  • 3
  • 10
0
votes
0 answers

Unzip BZ2 format using core c# libraries

I am trying to unzip a .bz2 file. I don't want to use external unzipping libraries or third party nuget packages. Is there as way to unzip using only c# core libraries ?
0
votes
0 answers

python bz2.decompress adds header

I am trying to decode a .bz2 file in Python. The problem seems to come when I use the decompress method as it adds a header/prefix before the original data. import bz2 with open("/Users/X/exampleFiles/secondP5D.bz2", "rb") as f: …
user2728349
  • 139
  • 1
  • 3
  • 12
0
votes
1 answer

How to read a .pgn.bz2 file with fread in R?

I am trying to read chess game files from https://database.lichess.org/ where the files are stored as a bzip of a pgn. A sample format of a pgn file would look something like this: [Event "4th Bayern-chI Bank Hofmann"] [Site "?"] [Date…
User2321
  • 2,952
  • 23
  • 46