Questions tagged [snappy]

Snappy is a compression algorithm for byte streams and a library implementing this algorithm. The standard distribution includes bindings for C and C++; there are third-party bindings for many other languages.

Snappy does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

Snappy is widely used inside Google, in everything from BigTable and MapReduce to Google's internal systems.

366 questions
5
votes
1 answer

Parquet compression degradation when upgrading spark

I have a spark job that writes data to parquet files with snappy compression. One of the columns in parquet is a repeated INT64. When upgrading from spark 2.2 with parquet 1.8.2 to spark 3.1.1 with parquet 1.10.1, I witnessed a severe degradation in…
Lior Chaga
  • 1,424
  • 2
  • 21
  • 35
5
votes
2 answers

How to load json snappy compressed in HIVE

I have a bunch of json snappy compressed files in HDFS. They are HADOOP snappy compressed (not python, cf other SO questions) and have nested structures. Could not find a method to load them into into HIVE (using json_tuple) ? Can I get some…
tensor
  • 3,088
  • 8
  • 37
  • 71
5
votes
1 answer

How to read in files with .snappy.parquet extension

I have files with .snappy.parquet extension that I need to read into my Jupyter notebook, and convert it to pandas dataframe. import numpy import pyarrow.parquet as pq filename =…
Chique_Code
  • 1,422
  • 3
  • 23
  • 49
5
votes
2 answers

HTTP Request Without curl or wget (Ubuntu Core Bash)

How do I make a HTTPS (or HTTP) request in Ubuntu Core? The curl and wget are unavailable (and I don't know of any alternatives). I am trying to update the DtDns with this line: https://www.dtdns.com/api/autodns.cfm? -- Edit Wasn't able to…
Maciek Rek
  • 1,525
  • 2
  • 14
  • 18
5
votes
2 answers

How to configure Executor in Spark Local Mode

In Short I want to configure my application to use lz4 compression instead of snappy, what I did is: session = SparkSession.builder() .master(SPARK_MASTER) //local[1] .appName(SPARK_APP_NAME) …
Ning Lin
  • 51
  • 1
  • 5
5
votes
0 answers

Switching mongodb from snappy to zlib compression with minimal disruption

I've been using the default (snappy) compression in mongodb with WiredTiger storage engine, and I'd like to switch to zlib. However, I've got some users already active on my application, with data in the database, and I'd like to make sure I make…
PSR
  • 51
  • 3
5
votes
1 answer

Redshift COPY command for Parquet format with Snappy compression

I have datasets in HDFS which is in parquet format with snappy as compression codec. As far as my research goes, currently Redshift accepts only plain text, json, avro formats with gzip, lzo compression codecs. Alternatively, i am converting the…
cloudninja
  • 133
  • 1
  • 2
  • 7
5
votes
2 answers

Best delimiter to safely parse byte arrays from a stream

I have a byte stream that returns a sequence of byte arrays, each of which represents a single record. I would like to parse the stream into a list of individual byte[]s. Currently, i have hacked in a three byte delimiter so that I can identify the…
L. Blanc
  • 2,150
  • 2
  • 21
  • 31
5
votes
1 answer

setting compression on hive table

I have a hive table based on avro schema. The table was created with the following query CREATE EXTERNAL TABLE datatbl PARTITIONED BY (date String, int time) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES…
Vikas Saxena
  • 1,073
  • 1
  • 12
  • 21
5
votes
2 answers

Valid uses cases for reinterpret_cast for unaligned memory access vs memcpy?

In the internals of snappy, there is a conditionally compiled section that selects dereferencing a reinterpret_cast'ed pointer as the best implementation for reads and writes of potentially unaligned 16, 32, and 64 bit integers on architectures that…
acm
  • 12,183
  • 5
  • 39
  • 68
5
votes
0 answers

Snappy compression returns only SNAPPY_INVALID_INPUT

I wrote a short category on NSData that does compression with libSnappy. It works like a charm during compression, however the decompression code results in SNAPPY_INVALID_INPUT. The interesting part is, despite the invalid Op-Code, snappy still…
CodaFi
  • 43,043
  • 8
  • 107
  • 153
4
votes
0 answers

Snappy not linking with rust

I was trying to follow the ffi example in the Rustnomicon. I cloned the snappy repository and built it. I placed the snappy library inside the targets/debug/deps/ directory. Here is the code - use libc::size_t; #[link(name = "snappy",…
Cool Developer
  • 408
  • 3
  • 12
4
votes
1 answer

Write parquet file with Snappy compression in Apache Beam

I am trying to write a parquet file as follow in Apache Beam using Snappy compression records.apply(FileIO.write().via(ParquetIO.sink(schema)).to(options.getOutput())); I see that is possible to set AUTO,GZIP,BZIP2,ZIP and DEFLATE as…
hlagos
  • 7,690
  • 3
  • 23
  • 41
4
votes
4 answers

Problems installing snappy on python on Alpine Linux

When i am trying to install Snappy on alpine linux using: pip install snappy I am getting the following error when it tries to install a required package called cypari. I installed snappy from "apk add snappy" gcc -fno-strict-aliasing -Os…
dwardu
  • 573
  • 6
  • 22
4
votes
2 answers

How do I enable Snappy codec support in a Spark cluster launched with Google Cloud Dataproc?

When attempting to read a Snappy compressed sequence file from a Spark cluster launched with Google Cloud Dataproc, I am receiving the following warning: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was…
1 2
3
24 25