Questions tagged [snappydata]

SnappyData is an open source integration of the GemFireXD in-memory database and the Apache Spark cluster computing system for OLTP, OLAP, and Approximate Query Processing workloads.

From https://github.com/SnappyDataInc/snappydata

SnappyData is a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) in a single integrated, highly concurrent, highly available cluster. This platform is realized through a seamless integration of (as a big data computational engine) with GemFireXD (as an in-memory transactional store with scale-out SQL semantics).

Within SnappyData, GemFireXD runs in the same JVM Spark executors run on. This allows for optimal performance in moving data in and out of Spark executors as well as making the overall architecture simpler. All Spark jobs should run in SnappyData though the SnappyData database can also be accessed using SQL via ODBC/JDBC, Thrift, REST without needing to go through Spark.

SnappyData packages Approximate Query Processing (AQP) technology. The basic idea behind AQP is that one can use statistical sampling techniques and probabilistic data structures to answer aggregate class queries without needing to store or operate over the entire data set. This approach trades off query accuracy for quicker response times, allowing for queries to be run on large data sets with meaningful and accurate error information. A real world example here would be the use of political polls run by Gallup and others where a small sample is used to estimate support for a candidate within a small margin of error.

It's important to note that not all SQL queries can be answered through AQP, but by moving a subset of queries hitting the database to the AQP module, the system as a whole becomes more responsive and usable.

Important links:

The SnappyData Github Repo

SnappyData public Slack/Gitter/IRC Channels

SnappyData technical paper

SnappyData Documentation

SnappyData ScalaDoc

SnappyData Screencasts

132 questions
2
votes
2 answers

How to create a table from a CSV?

SnappyData v.0.5 I want to do something similar to loading parquet files as found in the QuickStart load scripts. CREATE TABLE STAGING_AIRLINEREF USING parquet OPTIONS(path '../../quickstart/data/airportcodeParquetData'); But, I have CSV files…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

SnappyData REST API to Submit Job

I am trying to submit Snappy Job using REST API. We have been able to submit SnappyJob using snappy-job submit Command Line tool. I could not find any documentation how to do the same thing through REST API. I found somewhere mentioned in the…
Frans
  • 43
  • 3
1
vote
1 answer

How can I get external table jdbc url in SnappyData

Previously I created an external table in SnappyData like this: create external table EXT_DIM_CITY using jdbc options(url 'jdbc:mysql://***:5002/***?user=***&password=***', driver 'com.mysql.jdbc.Driver', dbtable 'dim_city'); but now I forget the…
Jiang Yan
  • 31
  • 5
1
vote
2 answers

getting error while importing data into snappy data from csv in java

My table schema in scala is snSession.sql("create table category_subscriber( id int, catId int, brandId int, domains int, osId int, rType int,rTime int, ctId int, icmpId int, setId int,rAt int, cyId int) USING column OPTIONS (BUCKETS …
pavs
  • 141
  • 1
  • 8
1
vote
1 answer

getting error while json object insertion through java in snappy data

I have a table which contains json object and array as data types for two fields.My table schema in scala is like snSession.sql("CREATE TABLE subscriber_new14 (ID int,skills Map ) USING column OPTIONS (PARTITION_BY 'ID',OVERFLOW…
pavs
  • 141
  • 1
  • 8
1
vote
1 answer

How to load JSON data in snappydata table with rowstore mode using sql query?

I'm using snappydata.I have 1M rows of JSON file,i want to load that JSON file to snappydata table using sql(snappydata sql).
Ashu
  • 347
  • 1
  • 9
1
vote
1 answer

How to enable query execution time in snappydata shell?

I want to know query execution time in snappy data. Is there any option which shows time in seconds?
Ashu
  • 347
  • 1
  • 9
1
vote
1 answer

Snappydata and external Hive compatibility

I'm trying to use Snappydata 1.0.1 to read and process data from Hadoop (HDP 2.6.3). When pointing to Hive metastore (via hive-site.xml in Snappydata config) Spark from Snappydata distribution can read list of databases, but cannot create table in…
Valentin P.
  • 1,131
  • 9
  • 18
1
vote
2 answers

Starting Snappydata by using inline arguments

On starting snappydata using the command line arguments I am getting the below error ERROR 38000: (SQLState=38000 Severity=20000) (Server=/X.X.X.157[1528] Thread=ThriftProcessor-0) The exception 'com.gemstone.gemfire.cache.TimeoutException: The…
1012ankur
  • 23
  • 4
1
vote
1 answer

Snappydata store with hive metastore from existing spark installation

I am using snappydata-1.0.1 on HDP2.6.2, spark 2.1.1 and was able to connect from an external spark application. But when i enable hive support by adding hive-site.xml to spark conf, snappysession is listing the tables from hivemetastore instead of…
1
vote
1 answer

Theta Sketch (Yahoo) on SnappyData

How to store Theta Sketch (Yahoo) on SnappyData's table instead of write to file? Because I generate billions of sketches every day and need to keep many millions of sketches online for real-time queries. Can anyone help me? Thanks.
Dein Tran
  • 13
  • 2
1
vote
2 answers

Read-Through cache with SnappyData

Can we have read-through cache behavior? Meaning application will issue sql query to SnappyData, then SnappyData will check if the data is in the cache (in SnappyData). If it is, it will return the data. If it is not, SnappyData will bring it in to…
Frans
  • 43
  • 3
1
vote
1 answer

How to integrate SnappyData with Kerberos

For enterprise usage, we need to integrate Kerberos for SnappyData. Do you have any documentation for doing that? Thanks
Frans
  • 43
  • 3
1
vote
1 answer

spring.datasource.driver-class-name for snappy data

Hi I want to use hikariCP with snappy data. Don't want to involve GemFire in between. How can be direct implementation possible with snappy data. I tried com.pivotal.gemfirexd.jdbc.ClientDriver which is working, but…
Rick
  • 13
  • 2
1
vote
1 answer

SnappyData per-row TTL

Is it possible to set TTL per row. Meaning the row will be automatically deleted when the TTL has passed. The TTL can be different for ever row in the same table. Thanks!
Frans
  • 43
  • 3
1
2
3
8 9