Questions tagged [snappydata]

SnappyData is an open source integration of the GemFireXD in-memory database and the Apache Spark cluster computing system for OLTP, OLAP, and Approximate Query Processing workloads.

From https://github.com/SnappyDataInc/snappydata

SnappyData is a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) in a single integrated, highly concurrent, highly available cluster. This platform is realized through a seamless integration of (as a big data computational engine) with GemFireXD (as an in-memory transactional store with scale-out SQL semantics).

Within SnappyData, GemFireXD runs in the same JVM Spark executors run on. This allows for optimal performance in moving data in and out of Spark executors as well as making the overall architecture simpler. All Spark jobs should run in SnappyData though the SnappyData database can also be accessed using SQL via ODBC/JDBC, Thrift, REST without needing to go through Spark.

SnappyData packages Approximate Query Processing (AQP) technology. The basic idea behind AQP is that one can use statistical sampling techniques and probabilistic data structures to answer aggregate class queries without needing to store or operate over the entire data set. This approach trades off query accuracy for quicker response times, allowing for queries to be run on large data sets with meaningful and accurate error information. A real world example here would be the use of political polls run by Gallup and others where a small sample is used to estimate support for a candidate within a small margin of error.

It's important to note that not all SQL queries can be answered through AQP, but by moving a subset of queries hitting the database to the AQP module, the system as a whole becomes more responsive and usable.

Important links:

The SnappyData Github Repo

SnappyData public Slack/Gitter/IRC Channels

SnappyData technical paper

SnappyData Documentation

SnappyData ScalaDoc

SnappyData Screencasts

132 questions
1
vote
2 answers

Understanding the # of buckets for my SnappyData table?

The default # of buckets is 113. Why? Why not 110? Does the bucket logic perform better with a certain "divisible by" value. There are a lot of examples in SnappyData with less buckets. Why is that? What logic went into determining to use less…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

Unable to perform NamedParameterJdbcTemplate UPDATE

I am trying to perform an UPDATE to a SnappyData row table via JDBC APIs and Spring's NamedParameterJdbcTemplate. Error is: Caused by: java.sql.SQLException: **(SQLState=XCL14 Severity=20000) The column position '1' is out of range. The number of…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

SnappyData -streaming table parameter "topics" clarification

I was reading through the documentation (http://snappydatainc.github.io/snappydata/streamingWithSQL/) and wanted to know what the parameter ":01" means after the topic name when working with kafka. Is this a partition number or number of threads…
mike w
  • 131
  • 6
1
vote
1 answer

SnappyData - inconsistent results deleting from column table

Reading through the documentation --> http://snappydatainc.github.io/snappydata/rowAndColumnTables/#row-and-column-tables I see that we should be able to perform DELETE FROM .... sql statements against row and column tables. I am seeing…
mike w
  • 131
  • 6
1
vote
1 answer

Cannot configure logging on SnappyData using conf/log4j.properties

I am following this documentation to configure logging in snappydata: http://snappydatainc.github.io/snappydata/configuration/#logging When I change conf/log4j.properties.template to "log4j.properties" and then stop/start all back up, I get ZERO…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

How to pass directives to snappy_ec2 created clusters

We have a need to set some directives in the snappy config files for the various components (servers, locators, etc). The snappy_ec2 scripts do a good job at creating all of the config's and keeping them in sync across the cluster, but I need to…
mrdoug
  • 13
  • 3
1
vote
1 answer

How do I pass key/value config settings when submitting a Job to Snappy Job Server?

I have a job that loads a data file from a different location each time. I'd like to submit the same job JAR and just pass a different location to it using the Config.java parameter of the runJavaJob() API. I do not see a way to pass key/value…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

Where do Spark components live in SnappyData Unified Cluster mode?

I'm trying to understand where all the "Spark" pieces fit into SnappyData's "Unified Cluster Mode" deployment topology. In reading this, the documentation is unclear about a few…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

Can't find Jetty's GzipHandler class when running JUnit against SnappyData

SnappyData v.0-5 My goal is to run a snappydata driver program to connect up to SnappyData in a remote server. I wrote a Junit to do this. However, when I run it, I get an error with the SparkContext is…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

Unable to submit a simple SnappySQLJob

I am unable to submit a simple job that just performs a System.out.println(). Here is the error I get back from the SnappyData Lead. snappy-job.sh submit --lead 10.0.18.66:8090 --app-name SimpleJobApp --class snappydata.jobs.SimpleJob --app-jar…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

org.apache.spark.sql.catalyst.TableIdentifier cannot be resolved error in SnappySQLJob

I have an compile-time error trying to write a SnappySQLJob. Am I missing a dependency? The error message is: The type org.apache.spark.sql.catalyst.TableIdentifier cannot be resolved. It is indirectly referenced from required .class…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

How do I PARTITION_BY multiple column names in a Column table?

This documentation states: http://snappydatainc.github.io/snappydata/rowAndColumnTables/ "Use the PARTITION_BY {COLUMN} clause to provide a set of column names that will determine the partitioning" I want the following columns to be the partition…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

How do I partition data in a column table in SnappyData?

I am unable to figure out the syntax to partition my 'column' table. Here is an example that fails on me as well as many variations on it. CREATE TABLE SENSOR_DATA_COL_BY_YEAR USING column OPTIONS(PARTITION_BY year_num, buckets '11') AS (SELECT…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

Maven artifacts for only a SnappyData client?

Today, I am loading a pom with these dependencies. However, the SpringBoot jar created from this is huge and I think it is because it literally contains all the Snappy Store jars, etc. The SpringBoot Jar that is built bundles all the Jetty jars…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
2 answers

gemfirexd client log not logging

I am running this from ubuntu. The Gemfire client log is not logging. Is there something incorrect in my syntax or property settings? java -jar sample-snappydata-sensor-0.0.1-SNAPSHOT.jar…
Jason
  • 2,006
  • 3
  • 21
  • 36
1 2 3
8 9