Questions tagged [snappydata]

SnappyData is an open source integration of the GemFireXD in-memory database and the Apache Spark cluster computing system for OLTP, OLAP, and Approximate Query Processing workloads.

From https://github.com/SnappyDataInc/snappydata

SnappyData is a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) in a single integrated, highly concurrent, highly available cluster. This platform is realized through a seamless integration of apache-spark (as a big data computational engine) with GemFireXD (as an in-memory transactional store with scale-out SQL semantics).

Within SnappyData, GemFireXD runs in the same JVM Spark executors run on. This allows for optimal performance in moving data in and out of Spark executors as well as making the overall architecture simpler. All Spark jobs should run in SnappyData though the SnappyData database can also be accessed using SQL via ODBC/JDBC, Thrift, REST without needing to go through Spark.

SnappyData packages Approximate Query Processing (AQP) technology. The basic idea behind AQP is that one can use statistical sampling techniques and probabilistic data structures to answer aggregate class queries without needing to store or operate over the entire data set. This approach trades off query accuracy for quicker response times, allowing for queries to be run on large data sets with meaningful and accurate error information. A real world example here would be the use of political polls run by Gallup and others where a small sample is used to estimate support for a candidate within a small margin of error.

It's important to note that not all SQL queries can be answered through AQP, but by moving a subset of queries hitting the database to the AQP module, the system as a whole becomes more responsive and usable.

Important links:

The SnappyData Github Repo

SnappyData public Slack/Gitter/IRC Channels

SnappyData technical paper

SnappyData Documentation

SnappyData ScalaDoc

SnappyData Screencasts

132 questions

votes

0 answers

snappystore - VM is exiting - shutting down distributed system

snappystore - VM is exiting - shutting down distributed system org.apache.spark.SparkContext - Invoking stop() from shutdown hook o.e.jetty.server.ServerConnector - Stopped ServerConnector@244e619a{HTTP/1.1}{0.0.0.0:4040} ERROR…

snappydata

asked Apr 19 '18 at 13:53

eason

votes

0 answers

Querying SnappyData records from 1000-th to 1010-th

Is there any pagination functionality in SnappyData using sql? For example, there are 10000 records in a table. Can we do a query to “get 10 entries starting from 1000 to 1010”? We need this to support pagination feature in our REST API.

snappydata

asked Feb 27 '18 at 05:35

1012ankur

votes

1 answer

Snappy: How can I snappy compress a file?

I received a file, I need to compress this file and pass the file name of compressed file as an argument to another method. File fileName = new File("fileA"); How can I snappy compress this file and get the file name of the compressed…

java snappy snappydata

asked Feb 20 '18 at 21:11

NoName

1,509
2
20
36

votes

1 answer

SnappyData or SnappySession: SignalHandler: received explicit OS signal SIGPIPE

Get this error when sending data to the cluster: 2018-01-22 18:49:54 101 4859929 [SIGPIPE handler] WARN snappystore - SignalHandler: received explicit OS signal SIGPIPE java.lang.Throwable: null at…

snappydata

asked Jan 23 '18 at 06:52

eason

votes

2 answers

Fastest way to create Dictionary from pyspark DF

I'm using Snappydata with pyspark to run my sql queries and convert the output DF into a dictionary to bulk insert it into mongo. I've gone through many similar quertions to test the convertion of a spark DF to Dictionary. Currently I'm using…

pyspark apache-spark-sql snappydata

asked Dec 07 '17 at 10:47

techie95

votes

1 answer

Create a table in SnappyData for large data set

I have 33 million records with me which I want to insert into Snappydata database. I've already tried to create a column table without setting its options. The problem is that spark is loading the whole database into the RAM. I want to set the…

snappydata

asked Nov 08 '17 at 04:39

techie95

votes

1 answer

Snappydata - sql put into on jobserver don't aggregate values

I'm trying to create a jar to run on snappy-job shell with streaming. I have aggregation function and it works in windows perfectly. But I need to have a table with one value for each key. Base on a example from github a create a jar file and now I…

spark-streaming snappydata

asked Oct 24 '17 at 08:30

Tomtom

votes

2 answers

SnappyData - snappy-job - cannot run jar file

I'm trying run jar file from snappydata cli. I'm just want to create a sparkSession and SnappyData session on beginning. package io.test import org.apache.spark.sql.{SnappySession, SparkSession} object snappyTest { def main(args: Array[String])…

apache-spark snappydata

asked Oct 20 '17 at 08:45

Tomtom

votes

1 answer

SnappyData SQL if else

I have two tables that I need to join table_A(ID, val), table_B(ID, val) to get a new table RESULT(ID, value) Where the value should be populated like this Case1: if there is a ID that exists in in both table_A and table_B, value should be 1,…

sql snappydata

asked Oct 11 '17 at 23:05

user3230153

votes

1 answer

Keep part of the data in memory and part in Disk

I have a column table with millions of records. I would like to keep only the last 3 months in memory, the rest need to be on disk but can be consulted. Is it possible to do this in SnappyData?

snappydata

asked Sep 05 '17 at 13:50

João Batista de Andrade

votes

2 answers

Can't connect to snappydata store in python

I am running the docker image for snappydata v0.9. From inside that image, I can run queries against the database. However, I cannot do so from a second server on my machine. I copied the python files from snappydata to the installed pyspark…

python snappydata

asked Aug 24 '17 at 01:57

user78393

votes

1 answer

How to use embeded spark with existing SnappyData

I have used snappy-sql and there I created some tables and did some inserts and queries... everything ok Then, as I need to import a lot of data from csv file I created a scala script that read each of the files and extract the data and tries to…

scala apache-spark snappydata

asked Jul 05 '17 at 08:33

Mauricio Chica Patiño

votes

1 answer

SnappyData data type compatibility

as I am creating a new database and importing the schema structure from Postgresql to SnappyData I am dealing with the problem about what to use to replace each data type. First what to use to have compatible data with timestamp, real, double, long…

postgresql timestamp compatibility type-conversion snappydata

asked Jun 29 '17 at 18:26

Mauricio Chica Patiño

votes

1 answer

snappydata Unable to access SnappyData store from an Existing Spark Installation using Smart Connector by Java

I try to connect to SnappyData store by smart connector style as description in http://snappydatainc.github.io/snappydata/howto/#how-to-access-snappydata-store-from-an-existing-spark-installation-using-smart-connector, but got…

java snappydata

asked Jun 06 '17 at 01:08

Michael Tao

votes

1 answer

SnappyData(Table is not showing in cluster)

I've created one program for snappy data in Java. I'm not able to get the table name in cluster. Also I'm can't understand the log file. Any hints? public static void main( String[] args ) { SparkSession spark = SparkSession .builder() …

java eclipse snappy snappydata

asked Apr 07 '17 at 10:43

user7574427

Prev 1 2 3

…

9 Next