Questions tagged [cassandra]

Apache Cassandra is a highly scalable, eventually consistent, distributed, structured row store. Questions about Cassandra server administration should be asked on https://dba.stackexchange.com/questions/tagged/cassandra .

Apache Cassandra is a highly scalable, eventually consistent, distributed, structured row/column store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google's . Like , Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.

Cassandra's Dynamo-based cluster model provides linear scalability and fault tolerance on commodity hardware or cloud infrastructure. Its support for replicating across multiple data centers is best-in-class, providing low latency and the ability to survive entire data center outages.

Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates and powerful built-in caching with the fastest write performance as compared to other database solutions and makes it a compelling option for big data processing. It provides linear scalability with the provision to add/remove nodes on the fly without downtime.

Cassandra was open-sourced by Facebook in 2008 and quickly became a top-level Apache project. Today, it's widely used by companies in many markets.

Official links:

Documentation

Useful Links:

20596 questions
41
votes
9 answers

MongoDB vs. Redis vs. Cassandra for a fast-write, temporary row storage solution

I'm building a system that tracks and verifies ad impressions and clicks. This means that there are a lot of insert commands (about 90/second average, peaking at 250) and some read operations, but the focus is on performance and making it…
Mark Bao
  • 896
  • 1
  • 10
  • 19
41
votes
4 answers

Not enough replica available for query at consistency ONE (1 required but only 0 alive)

I have a Cassandra cluster with three nodes, two of which are up. They are all in the same DC. When my Java application goes to write to the cluster, I get an error in my application that seems to be caused by some problem with Cassandra: Caused by:…
timsterc
  • 963
  • 3
  • 10
  • 18
41
votes
5 answers

Why was Cassandra written in Java?

Question about Cassandra Why the hell on earth would anybody write a database ENGINE in Java ? I can understand why you would want to have a Java interface, but the engine... I was under the impression that there's nothing faster than C/C++, and…
Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
41
votes
4 answers

Write timeout thrown by cassandra datastax driver

While doing a bulk load of data, incrementing counters based on log data, I am encountering a timeout exception. Im using the Datastax 2.0-rc2 java driver. Is this an issue with the server not being able to keep up (ie server side config issue), or…
Jay
  • 19,649
  • 38
  • 121
  • 184
41
votes
1 answer

Getting Cassandra datacenter name in cqlsh

How can I get the name of the datacenter in cqlsh? It's required for the constructor of DCAwareRoundRobinPolicy.
palacsint
  • 28,416
  • 10
  • 82
  • 109
40
votes
5 answers

Import and export schema in cassandra

How to import and export schema from Cassandra or Cassandra cqlsh prompt?
vpggopal
  • 437
  • 1
  • 4
  • 3
39
votes
1 answer

How to choose between Cassandra, Membase, Hadoop, MongoDB, RDBMS etc.?

Is there a paper/blog-post on when to use Cassandra or Membase or Hadoop or plain old relational databases ? Is there a paper discussing the strengths/weaknesses of each, and on what scenarios either of these technologies should be chosen ? I am…
Sankar
  • 6,192
  • 12
  • 65
  • 89
39
votes
3 answers

What is the relationship between Spark, Hadoop and Cassandra

My understanding was that Spark is an alternative to Hadoop. However, when trying to install Spark, the installation page asks for an existing Hadoop installation. I'm not able to find anything that clarifies that relationship. Secondly, Spark…
Shahbaz
  • 10,395
  • 21
  • 54
  • 83
39
votes
6 answers

What's the difference between creating a table and creating a columnfamily in Cassandra?

I need details from both performance and query aspects, I learnt from some site that only a key can be given when using a columnfamily, if so what would you suggest for my keyspace, I need to use group by, order by, count, sum, ifnull, concat,…
kumar
  • 2,905
  • 5
  • 22
  • 26
39
votes
6 answers

storing massive ordered time series data in bigtable derivatives

I am trying to figure out exactly what these new fangled data stores such as bigtable, hbase and cassandra really are. I work with massive amounts of stock market data, billions of rows of price/quote data that can add up to 100s of gigabytes every…
Shahbaz
  • 10,395
  • 21
  • 54
  • 83
38
votes
3 answers

Cassandra "no viable alternative at input"

I am trying to insert a simple row into the table. Can someone point out what is happening here ? CREATE TABLE recommendation_engine_poc.user_by_category ( game_category text, customer_id text, amount double, …
Adelin
  • 18,144
  • 26
  • 115
  • 175
37
votes
3 answers

cassandra get all records in time range

I have to work with a column family that has (user_id, timestamp) as key. In my query I would like to fetch all records in a given time range independent of the user_id. This is the exact table schema: CREATE TABLE userlog ( user_id text, ts…
Faber
  • 1,504
  • 2
  • 13
  • 21
37
votes
3 answers

Cassandra seed nodes and clients connecting to nodes

I'm a little confused about Cassandra seed nodes and how clients are meant to connect to the cluster. I can't seem to find this bit of information in the documentation. Do the clients only contain a list of the seed node and each node delegates a…
gak
  • 32,061
  • 28
  • 119
  • 154
36
votes
5 answers

Cassandra - transaction support

I am going through apache cassandra and working on sample data insertion, retrieving etc. The documentation is very limited. I am interested in knowing can we completely replace relation db like mysql/ oracle with cassandra? does cassandra support…
Kumar D
  • 1,308
  • 5
  • 19
  • 43
36
votes
2 answers

Cassandra has a limit of 2 billion cells per partition, but what's a partition?

In Cassandra Wiki, it is said that there is a limit of 2 billion cells (rows x columns) per partition. But it is unclear to me what is a partition? Do we have one partition per node per column family, which would mean that the max size of a column…
Benoit Thiery
  • 6,325
  • 4
  • 22
  • 28