Questions tagged [snappydata]

SnappyData is an open source integration of the GemFireXD in-memory database and the Apache Spark cluster computing system for OLTP, OLAP, and Approximate Query Processing workloads.

From https://github.com/SnappyDataInc/snappydata

SnappyData is a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) in a single integrated, highly concurrent, highly available cluster. This platform is realized through a seamless integration of (as a big data computational engine) with GemFireXD (as an in-memory transactional store with scale-out SQL semantics).

Within SnappyData, GemFireXD runs in the same JVM Spark executors run on. This allows for optimal performance in moving data in and out of Spark executors as well as making the overall architecture simpler. All Spark jobs should run in SnappyData though the SnappyData database can also be accessed using SQL via ODBC/JDBC, Thrift, REST without needing to go through Spark.

SnappyData packages Approximate Query Processing (AQP) technology. The basic idea behind AQP is that one can use statistical sampling techniques and probabilistic data structures to answer aggregate class queries without needing to store or operate over the entire data set. This approach trades off query accuracy for quicker response times, allowing for queries to be run on large data sets with meaningful and accurate error information. A real world example here would be the use of political polls run by Gallup and others where a small sample is used to estimate support for a candidate within a small margin of error.

It's important to note that not all SQL queries can be answered through AQP, but by moving a subset of queries hitting the database to the AQP module, the system as a whole becomes more responsive and usable.

Important links:

The SnappyData Github Repo

SnappyData public Slack/Gitter/IRC Channels

SnappyData technical paper

SnappyData Documentation

SnappyData ScalaDoc

SnappyData Screencasts

132 questions
1
vote
3 answers

SnappyData Spark Scall java.sql.BatchUpdateException

So, I have around 35 GB of zip files, each one contains 15 csv files, I have created a scala script that processes each one of the zip files and each one of the csv files per each zip file. The problem is that after some amount of files the script…
1
vote
1 answer

SnappyData importing data from multiple csv files into column or row tables

I am new to SnappyData and I am trying to import a huge amount of data into it. So the data is created from different sources and stored as csv files into zip files from each location. Lets say that the structure of the zips are zip1, zip2... zipn…
1
vote
2 answers

SnappyData collocated join in one physical server setup

I am joining two table with large number (currently 100M - 1B) of rows in SnappyData configured in one server with 64 CPU cores and 512GB of memory, and would like to utilize the collocated join . However the description in the doc seems to imply…
user3230153
  • 123
  • 3
  • 11
1
vote
1 answer

SnappyData snappy-sql PUT INTO cause error:spark.sql.execution.id is already set

I was using SnappyData SQL shell(snappy-sql) and running sql statements (PUT INTO) and ran into the error: ERROR 38000: (SQLState=38000 Severity=20000) (Server=localhost/127.0.0.1[1528] Thread=pool-3-thread-3) The exception…
user3230153
  • 123
  • 3
  • 11
1
vote
2 answers

SnappyData SQL PUT INTO not updating values

Hi I am using SnappyData and trying to update Table_A with rows from Table_B: Table_A(key1, key2, val, primary key(key1, key2)) -- cumulative results Table_B(key1, key2, val, primary key(key1, key2)) -- new rows - updates Since the Table_B would…
user3230153
  • 123
  • 3
  • 11
1
vote
2 answers

sql update table from another table snappydata

Hi I am using SnappyData's sql utility to update table my table from another table, say update Table_A with rows from Table_B. Table_A(col_key, col_value) -- partitioned table with large number of rows Table_B(col_key, col_value) -- small batch…
user3230153
  • 123
  • 3
  • 11
1
vote
1 answer

snappy-data examples not working in rowstore mode

I am start to learning the snappy-data,and running the snappy-data examples as per the Documentation, while Start in snappy-data server like $SNAPPY_HOME$ ./sbin/snappy-start-all.sh ./bin/run-example snappydata.JDBCExample comment ,the…
Karthik GB
  • 57
  • 5
1
vote
0 answers

How to connect snappy-data to hdfs

Hi I am start to learn snappy-data documentation version 0.7 for the purpose of connect to the snappy-data with Apache hadoop, i am following this documenation link but this documentation is not much clear how to configure snappy row store to hdfs,…
Karthik GB
  • 57
  • 5
1
vote
0 answers

How to connect snappy-data server with Apache zeppelin for %snappy.sql query support

I am currently working on snappy-data sql query functionality,Snappy-data support for Apache zeppelin ,We can do all the functionality using Apache zeppelin connecting to the snappy-data , Then i configured all the setup as per the snappy-data…
chinna
  • 79
  • 1
  • 10
1
vote
1 answer

SnappyData version 0.6 not found in Maven

https://github.com/SnappyDataInc/snappydata/releases I apologize in advance if this is the wrong forum to ask this question. We are interested in version 0.6. According to the readme, the maven repository should contain the following…
mike w
  • 131
  • 6
1
vote
1 answer

Does SnappyData in HDFS read write mode support a Primary key in tables?

I was reading about the HDFS persistence mode in SnappyData and it was confusing in the documentation. Can one create tables with primary keys that read/write to HDFS? http://rowstore.docs.snappydata.io/docs/disk_storage/persist-hdfs-topics.html
Fire
  • 306
  • 1
  • 4
  • 10
1
vote
1 answer

SnappyData - configuring streaming job spark settings

I can see how to configure a SparkConf when creating a streaming application (see here) I assume that I can configure the SparkConf through the SnappyStreamingContext for a streaming job similar to a streaming application. Let's say I get a handle…
mike w
  • 131
  • 6
1
vote
1 answer

builtin provider com.databricks.spark.csv not found in SnappyData v.0.5.2

SnappyData v.0.5.2 I am using this SnappyData version to get a fix for SNAP-961. However, now I am unable to load data from a CSV anymore, after moving from the preview release v0.5 to v0.5.2. ERROR IS: ERROR 38000: (SQLState=38000 Severity=-1) …
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
2 answers

Does query optimization fail if a Partition column is not in the Where predicate?

Lets say I have 20 SnappyData nodes. And, I have a table like this: example_timeseries_table id int not null, value varchar(128) not null, time timestamp not null foo varchar(128) not null, PARTITION BY COLUMN time And, I make the query: select…
Jason
  • 2,006
  • 3
  • 21
  • 36
1
vote
1 answer

What columns to PARTITION BY in a time-series table?

I want to collect time-series data and store it in snappydata store. I will be collecting millions of rows of data and I want to make queries across timeslices/ranges. Here is an example query I want to do: select avg(value) from…
Jason
  • 2,006
  • 3
  • 21
  • 36
1 2 3
8 9