Questions tagged [snappy]

Snappy is a compression algorithm for byte streams and a library implementing this algorithm. The standard distribution includes bindings for C and C++; there are third-party bindings for many other languages.

Snappy does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

Snappy is widely used inside Google, in everything from BigTable and MapReduce to Google's internal systems.

366 questions

votes

1 answer

Parquet compression degradation when upgrading spark

I have a spark job that writes data to parquet files with snappy compression. One of the columns in parquet is a repeated INT64. When upgrading from spark 2.2 with parquet 1.8.2 to spark 3.1.1 with parquet 1.10.1, I witnessed a severe degradation in…

asked May 06 '21 at 07:28

Lior Chaga

1,424
2
21
35

votes

2 answers

How to load json snappy compressed in HIVE

I have a bunch of json snappy compressed files in HDFS. They are HADOOP snappy compressed (not python, cf other SO questions) and have nested structures. Could not find a method to load them into into HIVE (using json_tuple) ? Can I get some…

json apache-spark hadoop hive snappy

asked Oct 14 '20 at 12:36

tensor

3,088
8
37
71

votes

1 answer

How to read in files with .snappy.parquet extension

I have files with .snappy.parquet extension that I need to read into my Jupyter notebook, and convert it to pandas dataframe. import numpy import pyarrow.parquet as pq filename =…

pandas parquet snappy

asked Nov 30 '19 at 13:50

Chique_Code

1,422
3
23
49

votes

2 answers

HTTP Request Without curl or wget (Ubuntu Core Bash)

How do I make a HTTPS (or HTTP) request in Ubuntu Core? The curl and wget are unavailable (and I don't know of any alternatives). I am trying to update the DtDns with this line: https://www.dtdns.com/api/autodns.cfm? -- Edit Wasn't able to…

linux bash ubuntu snappy

asked Feb 18 '18 at 15:44

Maciek Rek

1,525
2
14
18

votes

2 answers

How to configure Executor in Spark Local Mode

In Short I want to configure my application to use lz4 compression instead of snappy, what I did is: session = SparkSession.builder() .master(SPARK_MASTER) //local[1] .appName(SPARK_APP_NAME) …

apache-spark spark-streaming snappy

asked Sep 08 '17 at 06:42

Ning Lin

votes

0 answers

Switching mongodb from snappy to zlib compression with minimal disruption

I've been using the default (snappy) compression in mongodb with WiredTiger storage engine, and I'd like to switch to zlib. However, I've got some users already active on my application, with data in the database, and I'd like to make sure I make…

mongodb zlib snappy

asked Oct 28 '16 at 13:24

PSR

votes

1 answer

Redshift COPY command for Parquet format with Snappy compression

I have datasets in HDFS which is in parquet format with snappy as compression codec. As far as my research goes, currently Redshift accepts only plain text, json, avro formats with gzip, lzo compression codecs. Alternatively, i am converting the…

amazon-s3 compression amazon-redshift parquet snappy

asked Mar 10 '16 at 06:50

cloudninja

votes

2 answers

Best delimiter to safely parse byte arrays from a stream

I have a byte stream that returns a sequence of byte arrays, each of which represents a single record. I would like to parse the stream into a list of individual byte[]s. Currently, i have hacked in a three byte delimiter so that I can identify the…

java parsing encoding arrays snappy

asked Aug 14 '15 at 16:32

L. Blanc

2,150
2
21
31

votes

1 answer

setting compression on hive table

I have a hive table based on avro schema. The table was created with the following query CREATE EXTERNAL TABLE datatbl PARTITIONED BY (date String, int time) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES…

hive compression hiveql avro snappy

asked Aug 07 '15 at 00:30

Vikas Saxena

1,073
1
12
21

votes

2 answers

Valid uses cases for reinterpret_cast for unaligned memory access vs memcpy?

In the internals of snappy, there is a conditionally compiled section that selects dereferencing a reinterpret_cast'ed pointer as the best implementation for reads and writes of potentially unaligned 16, 32, and 64 bit integers on architectures that…

c++ clang memcpy reinterpret-cast snappy

asked Feb 09 '14 at 18:47

acm

12,183
5
39
68

votes

0 answers

Snappy compression returns only SNAPPY_INVALID_INPUT

I wrote a short category on NSData that does compression with libSnappy. It works like a charm during compression, however the decompression code results in SNAPPY_INVALID_INPUT. The interesting part is, despite the invalid Op-Code, snappy still…

nsdata snappy

asked Jan 20 '13 at 19:37

CodaFi

43,043
8
107
153

votes

0 answers

Snappy not linking with rust

I was trying to follow the ffi example in the Rustnomicon. I cloned the snappy repository and built it. I placed the snappy library inside the targets/debug/deps/ directory. Here is the code - use libc::size_t; #[link(name = "snappy",…

rust snappy

asked Dec 28 '20 at 13:39

Cool Developer

votes

1 answer

Write parquet file with Snappy compression in Apache Beam

I am trying to write a parquet file as follow in Apache Beam using Snappy compression records.apply(FileIO.write().via(ParquetIO.sink(schema)).to(options.getOutput())); I see that is possible to set AUTO,GZIP,BZIP2,ZIP and DEFLATE as…

parquet apache-beam snappy

asked Nov 29 '18 at 16:28

hlagos

7,690
3
23
41

votes

4 answers

Problems installing snappy on python on Alpine Linux

When i am trying to install Snappy on alpine linux using: pip install snappy I am getting the following error when it tries to install a required package called cypari. I installed snappy from "apk add snappy" gcc -fno-strict-aliasing -Os…

python linux pip alpine-linux snappy

asked Jul 14 '17 at 11:51

dwardu

votes

2 answers

How do I enable Snappy codec support in a Spark cluster launched with Google Cloud Dataproc?

When attempting to read a Snappy compressed sequence file from a Spark cluster launched with Google Cloud Dataproc, I am receiving the following warning: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was…

hadoop apache-spark google-cloud-platform snappy google-cloud-dataproc

asked Sep 28 '15 at 22:58

aeneaswiener

Prev 1 2

…

24 25 Next