Questions tagged [snappydata]

SnappyData is an open source integration of the GemFireXD in-memory database and the Apache Spark cluster computing system for OLTP, OLAP, and Approximate Query Processing workloads.

From https://github.com/SnappyDataInc/snappydata

SnappyData is a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) in a single integrated, highly concurrent, highly available cluster. This platform is realized through a seamless integration of (as a big data computational engine) with GemFireXD (as an in-memory transactional store with scale-out SQL semantics).

Within SnappyData, GemFireXD runs in the same JVM Spark executors run on. This allows for optimal performance in moving data in and out of Spark executors as well as making the overall architecture simpler. All Spark jobs should run in SnappyData though the SnappyData database can also be accessed using SQL via ODBC/JDBC, Thrift, REST without needing to go through Spark.

SnappyData packages Approximate Query Processing (AQP) technology. The basic idea behind AQP is that one can use statistical sampling techniques and probabilistic data structures to answer aggregate class queries without needing to store or operate over the entire data set. This approach trades off query accuracy for quicker response times, allowing for queries to be run on large data sets with meaningful and accurate error information. A real world example here would be the use of political polls run by Gallup and others where a small sample is used to estimate support for a candidate within a small margin of error.

It's important to note that not all SQL queries can be answered through AQP, but by moving a subset of queries hitting the database to the AQP module, the system as a whole becomes more responsive and usable.

Important links:

The SnappyData Github Repo

SnappyData public Slack/Gitter/IRC Channels

SnappyData technical paper

SnappyData Documentation

SnappyData ScalaDoc

SnappyData Screencasts

132 questions
1
vote
2 answers

SnappyData JDBC driver raising SQLState=XCL14 error

SnappyData v.0-5 w/ ClientDriver JDBC driver. I have a persistent row table in SnappyData called: sensor_data. From the snappy> shell, this query returns thousands for rows. snappy> select * from sensor_data where year_num = 2013 and month_num =…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

Schema not honored in queries

SnappyData v.0.5 I cannot seem to create row tables for a specific schema. This is important in a schema-based multi-tenant application where each tenant has his own schema. However, when I create my tables using RowStore DDL, they are queryable is…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
2 answers

Unable to script a replicated, persistent table from a CSV file

SnappyData v.0-5 Goal: I want to create a persistent, replicated ROAD table and load it from a CSV file using the Snappy Shell. The ROAD table should have 'road_id' as a primary key to prevent duplicate IDs. The commands I tried are: SET SCHEMA…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
2 answers

Tables created in Snappy shell do not show up in JDBC or Pulse

SnappyData v.0-5 The issue I am having is that my JDBC Connection's Table metadata and Pulse Web App do not see the table I created below. I create a table in SnappyData using the shell and a csv file. Data is here…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

Connecting to AWS required a Win hosts file change

SnappyData v.0.5 In our AWS SnappyData instance, we have the following attributes: public IP: 52.x.x.x (exposed to the Internet) private/internal IP: 172.x.x.x (exposed only inside AWS) private/internal Name:…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
2 answers

Can I use a SnappyData JDBC connection with only a Locator and Server nodes?

SnappyData documentation and architecture diagrams seem to indicate that a JDBC thin client connection goes from a client to a Locator and then it is routed to a direct connection to a Server. If this is true, then I can run JDBC queries without a…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

Lead node fails with /tmp/spark-jobserver/filedao/data/jars.data (Permission denied)

SnappyData v.0-5 I am logged into Ubuntu as a non-root user, 'foo'. SnappyData directory/install is owned by 'foo' user and 'foo' group. I am starting ALL nodes (locator,lead,server) with a script here: SNAPPY_HOME/sbin/snappy-start-all.sh Locator…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
2 answers

Questions on starting Locator using snappydata/bin> ./spark-shell.sh script

Spark v. 0.5 Here's the command I used to start a Locator: ubuntu@ip-172-31-8-115:/snappydata-0.5-bin/bin$ ./snappy-shell locator start Starting SnappyData Locator using peer discovery on: 0.0.0.0[10334] Starting DRDA server for SnappyData at…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
3 answers

Bean missing error when deploying SnappyData-0.5 pulse.war

I am trying to deploy the Pulse Web Application to an external Tomcat. I get this error when deploying. How should I fix this? org.springframework.beans.factory.NoSuchBeanDefinitionException: No bean named…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

Setup snappydata with custom spark and scala 2.11

I have read through the documentation but can't find answer for the following questions: I would prefer to setup an already running spark cluster (i.e. add a jar to be able to use SnappyContext), or is it mandatory to use bundled spark? If…
Saif
  • 57
  • 1
  • 7
1
vote
1 answer

How does indexedRDD in spark compare to SnappyData?

What is the status of the indexedRDD work in Spark? Has anyone looked at SnappyData? They make some claims around being able to do fast random reads and writes on dataframes.
0
votes
1 answer

PySpark Structured Streaming Query - query in dashbord visibility

I wrote some example code which connect to kafka broker, read data from topic and sink it to snappydata table. from pyspark.conf import SparkConf from pyspark.context import SparkContext from pyspark.sql import SQLContext, Row, SparkSession from…
tazonee
  • 13
  • 3
0
votes
1 answer

How to query a remote snappydata server from Python

I am trying to query snappydata from Python and some of the answers say in StackOverflow that Python cant connect to remote spark clusters. Could anyone help me how can I connect to snappydata cluster and get a simple query working? Code I am trying…
Ezio
  • 376
  • 5
  • 21
0
votes
1 answer

Connect to snappy data with aws external IP address

I am using Tibco ComputeDB, which is new to me. It uses sparkDB and snappyData. I can start both Spark and SnappyData and connect to snappydata using command connect client '127.0.0.1:1527' or with internal IP of aws server. But when I try to…
JSONX
  • 73
  • 5
0
votes
1 answer

Getting com.gemstone.gemfire.cache.LockTimeoutException after creating 20-30 tables in Snappydata

Spark job fetches the data from hbase and ingests the data to snappydata 1.1.0. Spark which is packaged with Snappydata 1.1.0 is launched as standalone cluster (snappy and spark share the cluster) and jobs are submitted to the Spark via spark…
1 2 3
8 9