Questions tagged [snappydata]

SnappyData is an open source integration of the GemFireXD in-memory database and the Apache Spark cluster computing system for OLTP, OLAP, and Approximate Query Processing workloads.

From https://github.com/SnappyDataInc/snappydata

SnappyData is a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) in a single integrated, highly concurrent, highly available cluster. This platform is realized through a seamless integration of apache-spark (as a big data computational engine) with GemFireXD (as an in-memory transactional store with scale-out SQL semantics).

Within SnappyData, GemFireXD runs in the same JVM Spark executors run on. This allows for optimal performance in moving data in and out of Spark executors as well as making the overall architecture simpler. All Spark jobs should run in SnappyData though the SnappyData database can also be accessed using SQL via ODBC/JDBC, Thrift, REST without needing to go through Spark.

SnappyData packages Approximate Query Processing (AQP) technology. The basic idea behind AQP is that one can use statistical sampling techniques and probabilistic data structures to answer aggregate class queries without needing to store or operate over the entire data set. This approach trades off query accuracy for quicker response times, allowing for queries to be run on large data sets with meaningful and accurate error information. A real world example here would be the use of political polls run by Gallup and others where a small sample is used to estimate support for a candidate within a small margin of error.

It's important to note that not all SQL queries can be answered through AQP, but by moving a subset of queries hitting the database to the AQP module, the system as a whole becomes more responsive and usable.

Important links:

The SnappyData Github Repo

SnappyData public Slack/Gitter/IRC Channels

SnappyData technical paper

SnappyData Documentation

SnappyData ScalaDoc

SnappyData Screencasts

132 questions

votes

1 answer

SnappyData colocated join with index

I would like to do colocated joins on two tables in SnappyData, and in order to further speedup the join, would it help if I also create indexes on the joining columns of the two tables? More specifically, the two tables would be quite large, and…

join indexing snappydata

asked Mar 21 '17 at 20:54

user3230153

votes

1 answer

How to load CSV data in snappydata table with rowstore mode using JDBC connection

Hi i am starting to learning the snappydata rowstore link there i tried all the example its working , but i need to store the csv , json data in snappydata table, in the example they are using manually connecting the snappy-shell and creating the…

scala jdbc gemfire snappydata

asked Mar 12 '17 at 08:19

Karthik GB

votes

1 answer

snappy-data with pre-build existing spark cluster

data i will integrated my Spark cluster with apache Hadoop configuration its working fine, Then i started to integrated my spark cluster with Azure data lake storage , its also working fine, for the Reference I taking this link for spark with azure…

apache-spark-sql snappydata

asked Feb 28 '17 at 14:45

chinna

votes

1 answer

Increase Java Memory on Spark for Building Large Hash Relations

I am currently trying to run a TPC-H query on SnappyData. At first the query gave me an error saying ERROR 38000: (SQLState=38000 Severity=-1) (Server=localhost[1528],Thread[DRDAConnThread_29,5,gemfirexd.daemons]) The exception 'Both sides of…

sql apache-spark out-of-memory snappydata

asked Oct 16 '16 at 21:05

IFH

votes

1 answer

SnappyData table definitions using partition keys

Reading through the documentation (http://snappydatainc.github.io/snappydata/streamingWithSQL/) and had a question about this item: "Reduced shuffling through co-partitioning: With SnappyData, the partitioning key used by the input queue (e.g., for…

streaming apache-kafka partitioning snappydata

asked Aug 25 '16 at 21:17

mike w

votes

1 answer

Unable to connect to snappydata store with spark-shell command

SnappyData v0.5 My goal is to start a "spark-shell" from my SnappyData install's /bin directory and issue Scala commands against existing tables in my SnappyData store. I am on the same host as my SnappyData store, locator, and lead (and yes, they…

snappydata

asked Aug 12 '16 at 15:38

Jason

2,006
3
21
36

votes

1 answer

SnappyData streaming table error converting "timestamp" datatypes

I have a snappy streaming table that reads json from a kafka topic. After some work, I've got this working, but ran into an issue when trying to map java.sql.Timestamp values from my SensorData object to the streaming table. The error was…

types streaming snappydata

asked Aug 12 '16 at 01:40

mike w

votes

1 answer

Database Profile for DBVisualizer & SnappyData?

We are using the DBVisualizer Generic profile and Wizard setup to access SnappyData Store, however, the tool randomly loses track of which Driver class to load and then, you need to re-create the connection each time. Product: DbVisualizer Pro 9.2…

dbvisualizer snappydata

asked Aug 08 '16 at 18:11

Jason

2,006
3
21
36

votes

1 answer

How do I create VARCHAR in a column table DDL?

This DDL for a column table results in CLOB for the id_ and name_ fields. How can I get VARCHARs instead? snappy> CREATE TABLE EXAMPLE_COLUMN_TABLE ( id_ VARCHAR(64), name_ VARCHAR(128), time_ TIMESTAMP, number_ INTEGER ) USING…

snappydata

asked Aug 05 '16 at 03:37

Jason

2,006
3
21
36

votes

1 answer

Where do I learn more about the SnappyData Column OLAP syntax?

I'm new to OLAP and SnappyData. My question is very specific. I want to know where to read further documentation on the SnappyData 'column' query language for OLAP queries. Perhaps it is an industry standard. Perhaps it is SnappyData specific. I…

olap snappydata

asked Jul 28 '16 at 01:24

Jason

2,006
3
21
36

votes

1 answer

How to start a spark-shell using all snappydata cluster servers?

I can't seem to find a way to start a shell using all the servers set up in conf/servers Only found it possible to submit to cluster jobs using /bin/snappy-job.sh where I specify the lead location, but would like to try real time shell to perform…

snappydata

asked Mar 22 '16 at 01:02

Saif

-1

votes

1 answer

Better performance and lower memory usage

I am developing an application where I will store complex XMLs in Snappydata for future analysis. For better analysis performance and lower memory consumption, what do you recommend? Store in xml, json or object? Previously, thanks for your…

performance memory snappydata

asked Jan 07 '18 at 00:07

João Batista de Andrade

Prev 1 2 3

…